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Abstract 


Two-sided  orthogonal  decompositions  (TSOD)  have  been  essential  tools  for  es¬ 
timating  the  numerical  rank  of  a  matrix  and  computing  various  important  subspaces 
including  the  range  (signal)  and  null  (noise,  error)  spaces.  They  include  partial  and 
complete  singular  value  decompositions  (SVD),  the  URV  decomposition  (URVD),  and 
the  ULV  decomposition  (ULVD). 

Given  the  TSOD  of  an  m-by-n  ( m  >  n)  matrix  A,  it  is  often  desirable  to  suc¬ 
cessively  add  a  new  row  to  A  and  to  compute  the  TSOD  of  the  modified  matrix.  This 
is  called  the  updating  problem.  The  opposite  computation,  the  downdating  problem, 
deletes  the  existing  row  from  A,  and  computes  the  TSOD  of  the  modified  matrix.  These 
problems  of  updating  and  downdating  can  be  transformed  into  those  of  modifying  a 
symmetric  positive  definite  matrix  by  a  rank-one  matrix. 

In  this  thesis  we  propose  several  algorithms  for  rank-one  updates  and  downdates 

to  these  decompositions  with  strong  stability  properties  and  efficient  implementations 

2 

on  high-performance  computers.  We  seek  algorithms  which  only  require  0(n  )  opera- 

3 

tions  per  update  or  downdate  unlike  recomputing  the  TSOD  in  0(n  ).  We  also  desire 
highly  regular  data  movement  inherited  in  these  algorithms  in  order  to  implement  these 
efficiently  on  the  distributed-memory  MIMD  multiprocessors.  The  algorithms  are  based 
upon  “chasing”  strategies  for  updating  and  downdating  procedures  for  orthogonal  de¬ 
compositions. 

In  modifying  the  SVD  and  partial  SVD,  our  algorithms  separate  singular  values 
into  “large”  and  “small”  sets  and  then  obtain  an  updated  bidiagonal  form  with  corre¬ 
sponding  “large”  and  “small”  columns.  This  makes  more  accurate  update  or  downdate. 


IV 


The  algorithm  can  be  implemented  almost  identically  for  both  updating  and  downdat¬ 
ing  by  reducing  the  problems  to  a  2  x  2  updating/downdating  problem.  Moreover,  the 
bidiagonal  reduction  phase  is  highly  paraUelizable.  A  perturbation  theory  for  modifying 
the  SVD  is  also  presented;  it  shows  that  the  computed  subspaces  associated  with  large 
and  small  singular  values  are  as  accurate  as  can  be  expected. 

An  alternative  to  performing  the  singular  value  decomposition  is  to  factor  a  matrix 
f  C  \  T 

into  A  =  U  I  q  jV  where  U  and  V  are  orthogonal  matrices  and  C  is  a  lower  triangular 
matrix  indicating  a  separation  between  two  subspaces  by  column  size.  These  subspaces 
are  denoted  by  V  =  (V  V  ),  where  the  columns  of  C  are  partitioned  conformally  into 
C  =  (C  Cg)  with  ||C2||^  <  €  .  Here  e  is  some  tolerance.  In  recent  years,  this  has  been 
called  the  ULVD.  A  downdating  algorithm  is  proposed  which  preserves  the  structure 
in  the  downdated  matrix  C  to  the  extent  possible.  Strong  stability  results  have  been 
proven  for  these  algorithms  based  on  a  new  perturbation  theory.  When  C  is  given  as  an 
upper  triangular  matrix,  we  have  the  URVD.  We  describe  algorithms  for  modifying  the 
URVD,  and  make  comparison  with  our  algorithms  for  modifying  the  ULVD  in  terms  of 
the  computed  subspaces. 

When  downdating  the  ULVD,  a  deflation  step  is  necessary  to  compute  its  nu¬ 
merical  rank.  We  propose  an  improved  algorithm  which  almost  always  guarantees  the 
rank-revealing  structure  of  the  decomposition  after  a  downdate  without  the  deflation 
process.  This  requires  some  condition  estimation.  Moreover,  one  can  monitor  the  con¬ 
dition  of  the  downdating  problem  by  tracking  exact  quantities  of  Frobenius  norms  of  all 
three  blocks  of  the  lower  triangular  factor  in  the  decomposition.  The  algorithm  is  also 
used  to  update  the  ULVD  with  a  slight  modification. 
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A  fully  parallel  algorithm  for  modifying  the  SVD  is  also  presented.  We  consider 
both  cyclic  and  consecutive  storage  schemes.  We  will  show  that  the  latter  scheme  out¬ 
performs  the  former  on  a  coarse-grain  distributed-memory  MIMD  multiprocessor  mainly 
due  to  high  communication  cost  required  by  the  former.  We  present  the  experimental 
results  on  the  32-node  Connection  Machine  (CM-5).  A  speed-up  of  20  and  the  efficiency 
of  CPU  utilization  60%  are  achieved  for  matrices  of  moderate  size. 

Our  algorithms  for  modifying  the  TSOD  offer  a  promising  approach  to  a  number 
of  problems  like  the  recursive  total  least  squares,  linear  regression,  the  subspace-based 
methods  for  signal  processing,  image  processing,  and  pattern  recognition.  These  prob¬ 
lems  require  a  real-time  solution  in  estimating  the  numerical  rank  of  the  data  matrices, 
and  orthonormal  basis  for  the  subspaces  associated  with  large  and  small  singular  values. 
Our  algorithms  are  capable  of  providing  those  answers  since  continual  updating  and 
downdating  are  required  by  the  underlying  physical  model. 
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Chapter  1 


Introduction 


1.1  Statement  of  Problem 


1.1.1  Two-Sided  Orthogonal  Decompositions  (TSOD) 


The  TSOD  of  an  m  x  n  matrix  A,  where  m>n ,  can  be  characterized  by  writing 
them  in  the  form 


A  =  U 


'ii' 


V  0  / 


(1.1) 


where  U  €  7£mXm  ancj  y  ^  7£nxn  are  orthogonal,  and  M  £  7 Znxn  has  one  of  the  fol¬ 
lowing  forms:  diagonal,  partially  reduced  bidiagonal,  upper  triangular,  lower  triangular. 
If  M  is  a  diagonal  matrix  of  the  form, 


M  =  diag  (av<r2,...,an) 


(1.2) 


where  we  presume 

ll(^+1>---CTn)ll2^e>  ak^td 

and  tol  is  the  user-supplied  tolerance,  (1.1)  is  called  the  singular  value  decomposition 
(SVD). 
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If  M  is  a  partially  reduced  bidiagonal  matrix  of  the  form, 


where 


k  n—k 


M  = 


>BX  0X 

0  £L 


k 

n—k 


IIS-1!!-1  >  toi,  p2||j,<t, 


(13) 


and  one  of  Bx  and  5  is  upper  bidiagonal,  and  the  other  diagonal,  (1.1)  is  called  the 
partial  singular  value  decomposition  (partial  SVD)  described  by  Van  Huffel  [118].  Here, 
we  presume  they  are  decoupled,  namely,  (fc,  k  +  1)  entry  of  M  is  zero. 

If  M  is  an  upper  triangular  matrix  of  the  form, 


k  n—k 


M  = 


^  R  S  ^ 


0  T 


k 

n—k 


(1.4) 


where 


\\(ST  TT)\\f  <  e,  H/E  1||2  1  >  tol. 


and  R  and  T  are  upper  triangular,  (1.1)  is  called  the  URV  decomposition  (URVD). 
If  M  is  a  lower  triangular  matrix  of  the  form, 


M  = 


k  n-k 

(  L  0  ^ 
F  G 


k 

n—k 


(1.5) 
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where 

||(FG)||f<£,  lli-’ll"1  >w, 

and  L  and  G  are  lower  triangular,  (1.1)  is  called  the  ULV  decomposition  (ULVD). 

Here  k  is  the  numerical  rank  of  A  and  e  <  \/n  —  k  *  tol.  We  use  ||  •  ||  to  denote 
the  Euclidean  norm  and  ||  •  ||^  to  denote  the  Frobenius  norm  of  a  matrix. 

The  SVD  has  been  one  of  the  most  important  tools  widely  used  in  a  number  of 
fields  of  science  and  engineering  for  decades,  mainly  because  it  offers  abundant  informa¬ 
tion  about  the  matrix  in  question.  The  SVD  has  many  benefits.  It  provides  orthonormal 
basis  for  important  subspaces  associated  with  the  matrix  including  the  range  (signal) 
and  null  (error,  noise)  spaces. 

The  URVD  and  ULVD  are  particular  cases  of  what  Lawson  and  Hanson  [70]  called 
HRK  decompositions.  Both  URVD  and  ULVD  were  introduced  by  Stewart  [101,  102],  as 
an  alternative  to  the  accurate  but  expensive  SVD.  Stewart  also  gave  methods  to  update 
these  decompositions  in  0(n  )  operations. 

In  fact,  all  of  these  decompositions  provide  valuable  information  about  the  data 
matrices.  Most  importantly  for  many  applications,  they  provide  the  orthonormal  basis 
for  the  range  and  null  spaces.  For  instance,  if  V  is  partitioned  according  to 

v  =  (v  v  ),  v  e  7inxk,  v  e  ftnx(n_fc)  (1.6) 

1  z  1  z 

then  it  is  not  difficult  to  see  that  the  columns  of  V  give  the  desired  orthonormal  basis 
for  the  approximate  null  space. 
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1.1.2  Modifying  the  TSOD 


We  are  interested  in  computing  the  TSOD  of  A  when  the  TSOD  of  A  is  known, 
where  for  updating, 


A  = 


/  a' 


\r  / 


(1.7) 


and  for  downdating, 


A  = 


(  T  \ 
r 


\  a  / 


(1.8) 


Here,  we  assume  appending  a  new  row  to  A  to  be  the  last  row  of  A ,  and  deleting  the 
first  row  of  A  when  downdating.  The  downdating  problem  is  considered  more  sensitive 
than  the  updating  problem  because  small  singular  values  of  A  tend  to  diminish  after 
a  downdate,  leaving  the  matrix  near  singular,  and  thus  can  be  unstable  [99].  On  the 
other  hand,  updating  increases  all  its  singular  values.  Clearly,  refactoring  the  whole 
decomposition  without  using  the  TSOD  of  A  is  not  practical;  it  requires  0{n  )  operations 
to  compute  any  TSOD. 


1.2  Problem  Formulation 

We  transform  the  updating/downdating  problem  into  the  rank-one  modification 
of  the  symmetric  eigenvalue  problem.  Since  from  (1.7)  and  (1.8), 


ata 


T  T 

A  A  +  prr 


=  VMTMVT  +  prrT 


=  V(MTM  +  pzzT)VT 
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where  p  >  0  for  updating  and  p  <  0  for  downdating,  and 


\ 

x 

w 


k 


n—k 


(1.9) 


Thus,  the  problem  of  modifying  the  TSOD  of  A  is  equivalent  to  the  following 

T 

eigenvalue  problem:  given  a  symmetric  positive  definite  matrix  A  A  with  known  eigen - 

T  T  T  T  T 

system  A1  A  =  VM  MV  ,  compute  the  eigensystem  of  M  M  +  pzz  ,  that  is,  to  find 

an  orthogonal  matrix  V  E  TZnxn  such  that 


MTM  +  pzzT  =  VMTMVT. 


(1.10) 


However,  we  do  not  form  the  explicit  product  A  A  because  of  possible  loss  of 
T 

information  in  forming  A  A.  Furthermore,  the  eigendecompositions  do  not,  in  general, 
preserve  the  block  structure.  Instead,  we  compute  orthogonal  matrices  U  E 
V  Ellnxn  such  that 


U 


1  u  X 
M 


\  °  / 


i/T  = 


M 


for  updating, 


(i.n) 


and 


U 


/  zTv  \ 

\  *  I 


vT  = 


•  0  X 


\M  / 


for  downdating, 


(1.12) 


where  M  is  bidiagonal  for  the  (partial)  SVD,  upper  triangular  for  the  URVD,  and  lower 
triangular  for  the  ULVD. 
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Then  A  is  given  by 


A  =  JUJT 


l  a  ^ 

M 


\  0  / 


VJ 


(1.13) 


where 


f  =  u  =  uaa.s(u,im_n_i), 


m— 1  1 

J=  0^,  for  updating, 

and 

1  m— 1 

J=  ^0  fordowndating. 

_  t  -  T 

In  theory,  M  M  +  pzz  remains  positive  semi-definite  after  downdating,  but  it 
is  not  always  true  in  finite  precision  floating-point  arithmetic. 

1.3  Importance  of  the  Problem 

Updating  and  downdating  are  important  in  signal  processing  and  statistical  appli¬ 
cations  as  new  observations  are  added,  and  the  old  observations  are  successively  deleted. 
They  can  efficiently  be  applied  to  problems  which  arise  in  a  number  of  applications: 
recursive  total  least  squares  problems  [23,  34],  linear  regression  [112,  123],  linear  predic¬ 
tion  [113],  pattern  recognition  [16],  system  identification  [60,  105],  spectral  estimation 
[17,  28,  68],  adaptive  beamforming  [76,  77,  96],  image  processing/restoration  [5,  7,  82], 
adaptive  filtering  [64],  direction  finding  [2,  73],  subspace-based  algorithms  in  signal  pro¬ 
cessing  such  as  MUSIC  (Multiple  Signal  Classification)  [94,  95]  and  ESPRIT  (Estimation 
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of  Signal  Parameters  via  Rotational  Invariance  Techniques)  [91,  92],  and  ocean  acoustic 
tomography  [110]. 

1.4  Issues  and  Concerns 

Algorithms  for  modifying  the  TSOD  must  have  at  least  the  following  features: 

Efficiency  The  algorithm  should  require  as  few  operations  as  possible,  for  example, 
0(n  ).  This  feature  would  make  it  possible  to  implement  the  algorithms  for  appli¬ 
cations  that  require  a  real-time  processing,  where  continual  updating/downdating 
of  the  decompositions  is  required. 

Stability  The  algorithm  should  produce  correct  answers  within  the  uncertainties  of 
the  given  data.  Therefore,  the  computed  solution  should  be  as  good  as  our  data 
warrants. 

Parallelism  It  should  be  easy  to  implement  the  given  algorithm  on  a  parallel  processor. 
It  is  desirable  to  develop  parallel  procedures  which  achieve  the  best  possible  load 
balance  and  minimize  the  communication  cost,  showing  high  efficiency  and  a  good 
speed-up  even  for  small  sized  problems. 

1.5  Basic  Approach  to  the  Problem 

Our  approaches  to  modifying  the  TSOD  use  ideas  from  “chasing”  algorithms 
[1,  93,  115,  125]  and  from  the  downdating  algorithm  due  to  Saunders  [46,  85].  Chasing 
algorithms  apply  a  series  of  Givens  rotations  from  both  sides  to  annihilate  all  components 
of  z,  reducing  M  to  a  desired  form.  These  algorithms  offer  highly  regular  data  movement, 
so  powerful  pipelining  strategies  can  be  used  on  parallel  computers.  A  systolic  array 
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implementation  of  a  scheme  with  similar  data  movement  patterns  to  this  one  (but  not 
similar  numerical  properties)  is  implemented  in  [117]. 

The  alternative  to  chasing  algorithms  for  modifying  the  SVD  is  that  of  finding 
the  zeroes  of  a  particular  spectral  function  [10,  25,  49,  53,  54,  67,  97].  However,  that 
approach,  as  yet,  does  not  allow  us  to  separate  the  singular  values  into  separate  blocks 
in  the  manner  discussed  in  Chapter  6. 

1.6  Main  Results 

The  following  are  the  main  results  of  this  thesis. 

•  Blockwise  procedures  for  modifying  the  TSOD  which  preserves  the  separation  be¬ 
tween  subspaces  associated  with  the  “large”  and  “small”  singular  values. 

•  An  error  analysis  of  these  procedures  demonstrating  that  the  subspaces  of  the 
modified  matrix  are  as  good  as  can  be  expected. 

•  Efficient  parallel  implementation  for  modifying  the  TSOD  that  incorporates  clever 
pipelining  strategies  using  highly  regular  data  movement  inherited  from  the  chasing 
algorithms. 

1.7  Review  of  Related  Work 

As  mentioned  in  Section  1.2,  problems  of  modifying  the  TSOD  can  be  viewed  as 
modifying  the  symmetric  positive  definite  matrix  followed  by  a  rank-one  matrix.  Gill, 
Golub,  Murray,  and  Saunders  [46]  considered  a  problem  of  modifying  the  decomposition 
of  a  matrix  following  a  rank-one  change,  where  they  showed  how  to  construct  recurrences 
for  the  product  of  Givens  rotations  in  order  to  modify  Cholesky  factor.  An  algorithm 
by  Saunders  was  also  described  for  downdating  QR  factorization,  which  has  become 
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a  backbone  of  a  number  of  downdating  algorithms  including  those  proposed  in  this 
dissertation. 

Bunch,  Nielsen  and  Sorensen  [26]  studied  a  problem  of  rank-one  modification 
of  the  symmetric  eigendecomposition  that,  in  turn,  gave  a  rise  to  their  algorithm  for 
updating  the  SVD  [25].  Their  method  was  based  on  solving  the  secular  equations. 

Dongarra  and  Sorensen  [38]  proposed  an  algorithm  that  always  computes  the 
eigenvalues  of  tridiagonal  matrices  with  high  relative  accuracy.  However,  when  eigen¬ 
values  are  clustered  together,  their  algorithms  had  difficulties  in  computing  numerically 
orthogonal  eigenvectors. 

An  improved  version  was  proposed  by  Sorensen  and  Tang  [97],  in  which  they 
incorporated  simulated  extended  precision  to  overcome  the  difficulties  of  previous  al¬ 
gorithm.  However,  using  the  simulated  extended  precision  made  the  algorithm  require 
IEEE  floating-point  arithmetic. 

With  careful  rearrangement  of  computations  in  solving  secular  equations,  Gu  and 
Eisenstat  [54]  have  succeeded  in  developing  a  backward  stable  algorithm  which  com¬ 
putes  numerically  orthogonal  eigenvectors  without  using  the  simulated  extended  preci¬ 
sion.  They  also  observed  that  by  using  the  fast  multipole  method  of  Carrier,  Greengard, 

2 

and  Roklin  [29,  30],  eigenvectors  can  be  computed  in  0(n  )  operations  as  compared 

3 

to  0{n  )  for  the  QR  algorithms  [47,  48].  They  applied  this  technique  further  to  sym¬ 
metric  tridiagonal  eigenproblems  [56],  bidiagonal  SVD  problem  [55],  and  the  problem  of 
downdating  the  SVD  [57]. 

To  this  end  several  chasing  strategies,  which  originated  from  Rutishauser  [93], 
have  been  proposed  for  updating  the  SVD  [27],  reducing  bordered  band  matrices  [1,  52, 
115],  and  their  parallel  versions  [116,  117],  and  two-way  chasing  scheme  by  Zha  [125]. 
Zha’s  algorithm  improved  conventional  one-way  chasing  procedures  by  about  a  factor  of 


2. 
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These  chasing  schemes  implemented  a  number  of  algorithms  for  modifying  the 
SVD  and  partial  SVD  [1, 14],  updating  the  URVD  [101]  and  ULVD  [102],  downdating  the 
URVD  [87]  and  ULVD  [13],  refinement  techniques  for  the  URVD  and  ULVD  [104,  100], 
and  modifying  the  ULLV  decomposition  of  two  matrices  [22,  75],  Parallel  versions  of 
some  of  these  algorithms  were  also  studied  in  [81,  103,  124]. 

1.8  Overview  of  the  Dissertation 

In  the  next  chapter,  we  review  some  fundamental  concepts  from  linear  algebra 
used  throughout  this  dissertation.  After  introducing  notations  and  basic  notions,  we 
discuss  briefly  various  types  of  orthogonal  decompositions  and  their  relations  to  the 
TSOD.  Since  we  assume  that  the  initial  TSOD  is  given  for  all  of  our  algorithms,  we 
describe  methods  for  computing  the  TSOD  as  well  as  their  numerical  properties.  We 
also  describe  the  recursive  total  least  squares  (RTLS)  problems  as  a  potential  application 
for  our  algorithms. 

Chapter  3  contains  a  detailed  description  of  basic  algorithms  frequently  used  in 
the  subsequent  chapters.  Some  of  algorithms  include  those  for  computing  and  apply¬ 
ing  a  Givens  rotation,  various  chasing  algorithms,  and  the  LINPACK  [37]  downdating 
algorithm.  We  also  give  special  treatment  for  2-by-2  updating/downdating  procedures. 

Chapter  4  discusses  methods  for  modifying  the  ULVD.  First,  we  present  a  detailed 
description  of  the  ULVD  downdating  algorithm.  An  error  analysis  for  this  algorithm  is 
also  given  to  verify  that  the  accuracy  of  the  computed  subspaces  for  large  and  small 
singular  values  is  assessed.  Finally,  we  give  numerical  tests  of  our  algorithm  in  the 
context  of  the  RTLS. 

An  improved  algorithm  for  downdating  the  ULVD  is  proposed  in  Chapter  5. 
When  downdating  the  ULVD  of  a  matrix,  a  deflation  step  is  necessary  to  compute  its 
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numerical  rank.  We  propose  an  efficient  algorithm  that  almost  always  guarantees  the 
rank-revealing  structure  of  the  decomposition  after  a  downdate  without  the  deflation 
process.  This  always  requires  some  condition  estimation.  Moreover,  we  show  how  to 
track  exact  quantities  of  Frobenius  norms  of  all  three  blocks  of  the  lower  triangular  factor 
in  the  decomposition  in  order  to  monitor  the  condition  of  the  downdating  problem.  The 
algorithm  can  also  be  used  to  update  the  ULVD  with  a  slight  modification. 

In  Chapter  6,  methods  for  modifying  the  SVD  and  partial  SVD  are  introduced. 
The  main  feature  of  these  methods  is  the  ability  to  separate  the  singular  values  into  large 
and  small  sets  and  then  obtain  an  updated  bidiagonal  form  with  corresponding  large  and 
small  columns.  A  perturbation  theory  for  updating  and  downdating  the  singular  value 
decomposition  is  also  presented. 

We  present  a  fully  parallel  algorithm  for  modifying  the  TSOD  in  Chapter  7.  Both 
cyclic  and  consecutive  storage  schemes  are  considered  in  parallel  implementation.  We 
show  that  the  latter  scheme  outperforms  the  former  on  a  coarse-grain  distributed-memory 
MIMD  multiprocessor.  We  give  the  experimental  results  on  the  32-node  Connection 
Machine  (CM-5). 

Finally,  we  give  our  conclusion  and  propose  future  work  in  Chapter  8. 
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Chapter  2 


Background 


In  this  chapter  we  present  briefly  important  concepts  from  linear  algebra  fre¬ 
quently  used  throughout  the  dissertation.  For  detailed  description  of  each  subject,  refer 
to  [50,  58,  89,  98,  106,  121,  122]. 


2.1  Notation  and  Basic  Notions 


We  use  the  following  notations  throughout  this  dissertation. 


n 


71 

n 


n 

mxn 


I 

n 


Set  of  real  numbers  denoted  by  lower  case  Greek  or  lower  case 
italic  if  there  is  no  confusion 

Set  of  real  n— vectors  denoted  by  lower  case  italic 

Set  of  real  m-by-n  matrices  denoted  by  upper  case  ITALIC  or 

upper  case  Greek  letters 

n-by-n  identity  matrix,  that  is,  I  =  where  e ^  = 

k—l  n—k 


O 

n 


span{a1? . . . ,  a^} 

range(A) 

null(i4) 


n-by-n  zero  matrix 
Transpose  of  A 

T 

Complex  conjugate  of  A 

Inverse  of  A  6  lZnXn ,  that  is,  A  ^  A  =  AA  =  / 

n 

{■LUhaiA^ 

{y  6  7lm  :  y  =  Ax,x  E  7Zn}  =  spanjc^, . .  .,0^} 

{x  E  nm  :  Ax  =  0} 


rank(A) 

A(A) 

<t(A) 


a 

max 


M) 


a  .  (A) 

min'  ' 


k(A) 


O(-) 


flop 


sign(x) 


dim(range(A)) 

Set  of  eigenvalues  of  A,  that  is,  {A  6  'R- :  Ax  =  Xx,  lZn} 

Set  of  singular  values  of  A 

i-th  singular  value  of  A  in  nondecreasing  order,  that  is,  cr.(A)  — 

Largest  singular  value  of  A 
Smallest  singular  value  of  A 

condition  number  of  A,  n(A)  =  a  ( A)/a  .  (4) 

g(n )  =  0(f(n))  if  there  exist  constants  c  and  TV,  such  that,  for 

all  n  >  TV,  we  have  g(n)  <  cf(n )  [3] 

A  floating  point  operation,  that  is,  the  amount  of  work  associated 

with  an  addition,  a  multiplication,  or  a  square  root 

Machine  unit 

x/\x\  if  x  ^  0;  1  if  x  =  0 


Definition  2.1  (Vector  Norms).  Leix  e  Hn.  Then 


Nlj  =  E  l*fl»  INI00  =  max|a:.|,  IMI2  =(£*?) 

*=1  1  \*=1  / 


Definition  2.2  (Matrix  Norms).  Let  A  e  TZmxn.  Then 


MIL  = 


M*l 


l 

m  n  \  2 


p  x^o  IfI 


M!If=  (EEKjI 


where  p  =  1,2,  oo. 
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We  use  ||  •  ||  to  denote  Euclidean  norm  ||  •  ||2>  and  ||  •  ||^  to  denote  Frobenius  norm.  It 
can  be  easily  shown  that  ^^(4)  =  ||.4||. 

Next,  we  devise  a  notion  of  distance  between  subspaces. 


Definition  2.3  ([50,  p.77]).  Let  W  =  (W^  W2)  and  Z  -  (Z1  Z2 )  be  orthogonal 
matrices  where  W  ,  Z  G  72nX ^  and  W ’  Z  G  72nX^”  /y  5’  —  range(W,)  and 

II  Z  Z  I  1 

S2  =  range(Z1),  then  dist^^)  =  W^Z^ \  =  /l  - 

Thus,  if  sin(0)  =  dist(S  ,5  )  for  some  6,  then  &  is  the  largest  angle  between  the 
two  subspaces. 

Tji  y  fl 

Definition  2.4.  A  matrix  A  G  72  is 


diagonal 

if  a. .  =  0, 
*3 

*  #  j; 

tridiagonal 

o' 

II 

\i  -  j\  >  i; 

upper  bidiagonal 

^p 

CS; 

II 

© 

i>  j  or  j  >  i  +  1; 

lower  bidiagonal 

p 

II 

a> 

i  <  j  or  j  <  i  —  1; 

upper  triangular 

if  a..  =  0, 
*3 

*  >  j; 

lower  triangular 

if  a. .  =  0, 

ij 

i  <  j; 

upper  Hessenberg 

if  a. .  =  0, 

i>  j  + 1; 

lower  Hessenberg 

i/a. .  =  0, 

ij 

*  <  i  —  i. 

Definition  2.5.  A  matrix  A  g  72”x”  is 
symmetric  if  A ^  =  A; 

positive  definite  if  x'  aF x  >0,  0  ^  x  G  72.”; 

T  T  n 

positive  semi-definite  if  x  A  x  >  0,  0  /  x  G  72  ; 

T  T 

orthogonal  if  A  A  =  A  A  =  I  ; 

permutation  if  A  =  (e  ,  ...,e  )  where  (s  , . . s^)  is 

In 

a  permutation  of  (1, . . . ,  n). 
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||  •  ||  and  ||  •  || jj,  are  orthogonally  invariant  norms,  that  is,  for  any  x  6  72n ,  A  € 
7 2mxn,  and  orthogonal  matrices,  Q  G  7 ^rnXTn  anc[  %  g  lZnXn,  we  have 


119*11  =  11*11,  119^11  =  NI,  \\Qaz\\  =  pH 


In  theory,  if  rank(.A)  =  fc  for  ^4  G  TZmxn,  then  we  have 


^>^>•••>^>0,  »t+,  =  -  =  <'n  =  o 


where  cr.  are  singular  values  of  j4  defined  in  (1.2).  However,  in  practice,  a,  is  not 

%  fcT  1 

exactly  equal  to  zero,  but  cr^+1  =  O(fi),  where  p  is  the  machine  unit.  Therefore,  we 
define  the  numerical  e  —  rank  to  be  the  largest  integer  k  such  that 


ek>t 


that  is, 


0,  >  cr 


fc-h 1 


meaning  that  there  exists  an  obvious  gap  between  the  singular  values.  A  number  of 
signal  identification  problems  assume  a  significant  gap  in  the  singular  value  spectrum  of 
the  data  matrix.  Thus,  approximate  range  and  null  spaces  associated  with  the  large  and 
small  singular  values  can  be  defined  accordingly. 


2.2  Stability  of  Algorithms 


Let  T{x)  be  a  function  of  the  input  data  x . 
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DEFINITION  2.6.  An  algorithm,  for  computing  T{x)  is  backward  stable  if  the  computed 
solution  T(x)  is  the  exact  solution  of  a  slightly  perturbed  problem  with  data  x. 

This  definition  is  similar  to  that  of  Bunch  [24].  Backward  stable  algorithms  are 
very  satisfactory  although  all  of  the  algorithms  proposed  in  this  thesis  for  modifying  the 
TSOD  are  not  backward  stable.  They  are  mixed  stable  as  defined  in  the  following  sense: 

Definition  2.7.  An  algorithm  for  computing  T(x)  is  mixed  stable  if  the  computed 
solution  T(x)  is  close  to  the  solution  of  a  slightly  perturbed  problem  with  data  x. 

Definition  2.7  is  used  in  the  context  of  modifying  orthogonal  decompositions  [21, 
83,  99],  and  downdating  least  squares  solutions  [20].  We  will  also  use  this  definition  of 
stability  when  we  analyze  our  algorithms  in  the  subsequent  chapters. 

2.3  Model  of  Computation 

The  machine  unit  p  is  the  smallest  number  which  satisfies 

\fl(aopb)  —  (aop6)|  <  p  |aop6|  (2-1) 

where  op  is  one  of  the  four  arithmetic  operations,  +,  — ,  x,  -r,  and  fl(aopb)  is  the  floating¬ 
point  representation  of  the  exact  result  a  op  b.  We  take  the  usual  model  of  arithmetic, 
assuming  underflow/overflow  does  not  occur, 

fl(aopb)  =  (aop6)(l  +  £)>  l£l  <  (2-2) 

Most  computers  take  this  model  except  for  those  without  guard  digits  such  as  Cray  for 
which 

fl(aopb)  =  a(l  +  fj) op 6(1  +  £2),  I^M^I  <  M 
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With  this  model  we  may  obtain  unsatisfactory  results  in  addition  and  subtraction. 

We  also  require  for  our  model 

fi(Vi)  =  VZ(  1  +  0,  \t\<n,  *>o-  (2.3) 

2.4  Orthogonal  Factorizations 

Orthogonal  decompositions  play  an  important  role  in  a  number  of  problems  in 
matrix  computations  such  as  least  squares  and  eigenvalue  problems.  In  this  section 
we  review  two  special  orthogonal  transformations:  Householder  reflections  and  Givens 
rotations,  and  a  family  of  orthogonal  decompositions  that  can  be  computed  by  a  series 
of  application  of  these  transformations. 

2.4.1  Householder  Transformation 

A  Householder  transformation  (reflection)  of  order  n  takes  the  form 

H  =  I  -  2vvT/vTv  (2.4) 

where  v  is  called  a  Householder  vector.  It  is  easy  to  verify  that  H  is  symmetric  and 
orthogonal.  The  Householder  transformation  can  be  used  to  annihilate  a  number  of 
components  of  a  vector.  For  example,  let  x  €  7£n.  If  we  choose  v  such  that 

v.  =  0,  i  =  1, . . k  —  1 
% 

vk  =  xk~ s’ 


i  =  k  +  l,...,n 
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where 

»  =  -sign(*t)  (s?=t*j)!, 

then  we  obtain 

T 

H  X  —  * 

Note  that  we  always  choose  s  so  that  5  and  x ^  can  have  opposite  sign  to  prevent  the 
loss  of  precision  in  the  computation. 

2.4.2  Givens  Transformation 

When  one  wishes  to  zero  elements  more  selectively,  Givens  transformations  (ro¬ 
tations)  do  the  best  job.  A  Givens  rotation  takes  the  form 


where  c  =  cos(0)  and  s  =  sin(0)  for  some  6 . 
Given  a  vector  x  £  7£n,  if  we  choose 
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then  we  obtain 

J(i,j,e)Tx  =  (x1 . . xt_v0,zk+y...,zjr. 

Algorithm  3.2  described  in  the  next  chapter  computes  c  and  s  without  causing  overflow 
and  underflow  in  computing  p. 

2.4.3  Rank  Revealing  QR  Factorization 

The  QR  factorization  of  an  m-by-n  matrix  A  is  given  by 

n 

(2.5) 

m—n 

Q  =  (Q1Q2),  q1  enmxn,  Q26Kmx(rn~n) 

where  Q  G  'Jlmxm  is  orthogonal  and  R  G  TZnXn  is  upper  triangular. 

If  A  has  full  column  rank,  the  columns  of  Q  spans  range(A).  However,  when  A 
is  near  rank-deficient,  and  computing  orthonormal  bases  for  range(A)  and  null(A)  is  of 
interest,  the  factorization  of  the  form  (2.5)  is  obviously  inadequate. 

It  can  be  shown  that  with  a  careful  rearrangement  of  columns  of  A  with  some 
pivoting,  one  can  produce  another  QR  factorization  which  reveals  the  numerical  rank  of 
A.  A  rank- revealing  QR  (RRQR)  factorization  of  A  G  7 zmxn  is  any  decomposition 


A  =  Q 


1  R  ' 


i  w  I 


n—k 


lR  n  \ 
^11  "12 


AU  =  Q 


0  R. 
0 


22 

0 


A: 

n—k 

m—n 


(2.6) 
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Q  =  (Q1  Q2), 


Q^n1 


^mx(m—n) 


where  R ^  and  i?22  are  upper  triangular,  R ^  is  well- conditioned,  ||i2~1||~1  «  cr^A), 
H^22 II  ~  anc^  ^  *s  a  Permiltation  matrix.  Here,  k  is  the  numerical  rank  of  A. 

Note  that  ||.R  ||  ls  not  necessarily  small  compared  to  ||iZjj||.  Thus,  the  factorization 

does  not  produce  explicitly  the  approximate  null  space  of  A. 

Chan  [31],  however,  shows  that  RRQR  factorization  may  produce  the  approximate 
null  space  for  A  with  high  accuracy  when  there  is  a  large  gap  in  the  singular  value 
spectrum  of  A  although  that  may  not  be  a  practical  assumption  in  some  applications. 
His  algorithm  incorporates  several  techniques  such  as  an  efficient  condition  estimation 
and  column  pivoting.  His  algorithm  also  guarantees  an  RRQR  factorization  for  a  high- 
rank  problem. 

Using  similar  ideas,  Foster  [44]  independently  developed  a  stable  algorithm  for 
determining  the  numerical  rank  of  a  matrix  without  requiring  column  interchanges.  Hong 
and  Pan  [62]  recently  showed  that  the  permutation  n  always  exists,  and  gave  a  method 
for  constructing  such  permutation.  However,  because  of  the  high  cost  required  by  the 
procedure,  it  has  more  theoretical  than  practical  value. 


2.4.4  Complete  Orthogonal  Decompositions 

It  turns  out  that  with  an  appropriate  choice  of  a  general  orthogonal  matrix  Z  £ 
7 Znxn  (considering  permutation  matrices  as  a  special  class  of  orthogonal  matrices),  one 
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can  reduce  R  in  (2.6)  even  further  so  that  ||/2  ||  becomes  small  as  well: 


k  n-k 


*11  *12  * 

AZ  —  Q  0  R  n—k  , 

JLl 

0  0  /  rn-n 


z  =  (zl  z2),  z1  ennxk,  z2  eftnx(n 


where  *  w  ©^(A),  an<^ 


"  »w(  A). 


Then,  it  is  easy  to  see  that  columns  of  Z  provides  the  orthonormal  basis  for  the  ap¬ 


proximate  null  space. 


2.5  Computing  the  TSOD 


2.5.1  Computing  the  Singular  Value  Decomposition 

A  standard  way  of  computing  the  SVD  of  A  €  7£mxn  involves  two  steps:  bidiag- 
onalization  and  computation  of  the  SVD  of  resulting  bidiagonal  matrix.  The  first  step 

771 X  771 

requires  to  find  products  of  Householder  transformations,  U  =  U  £  72 

and  V  =  V  •  •  •  V  ,6  72nxn  such  that 
1  n-1 


U1  AV  = 


B  n 


0  I  m~n 
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where  B  is  upper  bidiagonal.  Then,  we  compute  orthogonal  matrices  P,Q  £  7 znxn  such 
that 

B  =  PT,QT,  E  =  diag(cr1, . . 


Thus  we  obtain 


A  =  X 


\0/ 


where  X  =  U  diag (P,  I  )  and  Y  =  VQ. 

ov  ra—rr 

2 

The  whole  process  requires  0(m  7i)  flops.  The  second  phase  of  computing  the 
SVD  of  bidiagonal  matrix  can  be  done  by  QR-iteration  [47,  48]  which  is  implemented 
in  the  LINPACK  [37].  But  the  singular  values  computed  by  this  method  differ  from  the 
true  singular  values  by  at  most  p(n)  •  p  •  a  (A),  where  p(n )  is  a  moderately  growing 
function  of  n.  Thus,  large  singular  values  are  computed  with  high  relative  accuracy,  but 
small  ones  are  not  generally  accurate. 

Demmel  and  Kahan  [35]  developed  an  algorithm  for  computing  all  the  singular 
values  of  a  bidiagonal  matrix  to  maximal  relative  accuracy  independent  of  their  mag¬ 
nitudes.  Their  algorithm  implemented  in  LAPACK  [6],  is  essentially  the  QR-iteration 
incorporated  with  a  “zero-shift”,  which  is  often  faster  than  the  standard  algorithm  im¬ 
plemented  in  the  LINPACK. 

Fernando  and  Parlett  [40]  simplified  the  zero-shift  bidiagonal  algorithm  by  Dem¬ 
mel  and  Kahan  even  further  by  replacing  a  zero-shift  QR  step  with  two  steps  of  LR 
iteration  that  implement  the  qd  algorithm.  We  describe  the  qd  algorithm  in  the  next 
chapter. 

Other  methods  for  computing  all  the  singular  values  with  high  relative  accuracy  of 
a  bidiagonal  matrix  include  bisection  [15],  Rayleigh  quotient  iteration  [88].  But  they  are 
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not  competitive  in  speed  with  the  zero-shift  bidiagonal  algorithm  and  the  qd  algorithm, 
although  they  are  probably  the  most  parallelizable  algorithms  for  this  problem. 

It  is  a  well-known  fact  that  reducing  a  dense  matrix  into  bidiagonal  form  can 
introduce  large  relative  errors  in  its  singular  values.  The  Jacobi  method  for  computing 
the  SVD  of  a  dense  matrix  is  much  slower  but  more  accurate  than  any  algorithms  that 
first  bidiagonalize  the  matrix  [36].  In  this  iterative  algorithm,  a  series  of  Givens  rotations 
are  applied  to  pairs  of  rows  and  columns  to  reduce  off-diagonal  entries.  With  a  clever 
ordering  [74],  the  algorithm  can  be  implemented  in  parallel,  being  competitive  in  speed 
with  other  methods. 

2.5.2  Computing  the  ULV  and  URV  Decompositions 

The  ULVD  of  A  E  TZmxn  can  be  obtained  by  computing  its  QL  factorization 


where  Q  E  is  orthogonal  and  L  E  lZnxn  lower  triangular,  followed  by  computing 

the  ULVD  of  L  using  the  deflation  technique  described  in  [100,  102].  First,  we  estimate 
an  approximate  left  singular  vector  of  unit  norm  of  L  which  corresponds  to  cx^(T) 
using  some  condition  estimator.  A  survey  of  popular  condition  estimators  is  given  in 
[61].  Then,  we  compute  an  orthogonal  matrix  Q  E  TZnxn  such  that  u^Q  =  and  an 
orthogonal  matrix  P  E  7£nXn,  such  that 

ajl)  =  ||  VtH  =  \\(uTnQ)(QT  LP)\\  =  \\el(QTLP)l 
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which  is  the  size  of  the  last  row  of  L.  Here,  P  is  applied  to  restore  Q  L  into  the  lower 
triangular  form.  We  repeat  this  deflation  process  until  all  the  small  rows  of  L  appear  in 
the  decomposition  yielding  the  ULVD  of  A  of  the  form  (1.5). 

Similarly,  the  URVD  of  A  can  be  obtained  by  computing  its  QR  factorization  of 
the  form  (2.5)  followed  by  computing  the  URVD  of  R  using  the  deflation  steps.  This  time 
we  estimate  an  approximate  right  singular  vector  of  unit  norm  of  R  which  corresponds 

—  71X71  ~T 

to  cr^(iZ),  and  then  compute  an  orthogonal  matrix  Q  ER  such  that  Q  and 

an  orthogonal  matrix  P  E  RnXn,  so  that 

an(R)  =  HJtoJI  =  ||(PTfl<3)(er»„)||  =  ||  (PTRQ)en\\ 

which  is  the  size  of  the  last  column  of  R.  Here,  P  is  applied  to  restore  RQ  into  the 
upper  triangular  form.  We  repeat  this  deflation  process  until  all  the  small  columns  of  R 
appear  in  the  decomposition  yielding  the  URVD  of  A  of  the  form  (1.4). 


2.6  Subspaces  from  the  URV  and  ULV  Decompositions 

In  this  section  we  present  error  bounds  for  accessing  the  accuracy  of  subspaces 
computed  by  the  TSOD,  particularly,  the  ULVD  and  URVD.  The  discussion  that  follows 
is  largely  abridged  from  that  of  Fierro  and  Bunch  [41,  42]. 

Let  the  SVD  of  A  E  7Zmxn,  m  >  n  be 


E 


\ 

/ 


f 


A  =  U 


0 


(2.8) 
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where 


k  n—k  m—n 


k  n—k 


v=  (",  u2  *,).  y-  ( y ,  *i). 


and  £  has  the  form  (1.2),  and  let  the  URVD  of  A  be 


A  =  U 


R 


(  C  N 

CR 

\  °  / 


VT 
L  ’ 


(2.9) 


where 


k  n—k  m—n 


( 


Vm  VR2  VRZ 


)• v*-  ( 


k  n—k 


V  V 
R\  R2  k 


and  C ^  has  the  form  (1.4),  and  let  the  ULVD  of  A  be 


CT 

-4  =  V  L 

'  0 


VT 
L  ’ 


(2.10) 


where 


k  n—k  m—n 


UL=  Kl  UL2  U L7> 


)'  F‘  =  ( 


k  n-k 


V  V 
LI  L2 


and  C ^  has  the  form  (1.5).  Here,  Z7,  U ^  U ^  V,  V^,  and  V ^  are  orthogonal  matrices. 

Then  we  have  the  following  bounds  on  subspaces  computed  by  the  URVD  and 
ULVD,  which  are  associated  with  the  large  and  small  singular  values. 
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Theorem  2.1  ([42]).  Let  A  £  nmXn  have  the  SVD  of  the  form  (2.8)  and  the  URVD 
of  the  form  (2.9).  Suppose  o  .  ( R )  >  ||T||.  Then 


11*11 


m\+\\n 


<  \\v;  vm\\  <  7 


a  .  (R)  II^H 
minv  7  11  11 


min 


w-imr 


(2.11) 


II  u*u 


R\"  ~ 


Mimi 


<inw-imr 


(2.12) 


Theorem  2.2  ([42]).  Let  A  £  nmxn  have  the  SVD  of  the  form  (2.8)  and  the  ULVD 
of  the  form  (2.10).  Suppose  a  .  ( L )  >  ||G||.  Then 


iufy  <  -jJ3i  iigh 


a  .  (L)  -  ||G||‘ 
mmv  7  11  11 


(2.13) 


—  <  \\uTu.a\  < 


Ill’ll 


U\\  +  Ill'll  ~  "  2  L1  Vn^-INI 


(2.14) 


From  these  theorems  we  immediately  realize  that  the  quality  of  subspaces  com¬ 
puted  by  the  URVD  and  ULVD  do  not  depend  on  the  gap  in  the  singular  value  spectrum, 
which  is  required  by  the  RRQR  decomposition.  We  also  observe  that  by  keeping  ||5||  and 
||jF||  small,  we  can  obtain  highly  accurate  subspaces.  Several  ways  of  achieving  this  are 
described  in  [41,  42,  104].  It  can  be  also  argued  that  the  ULVD  computes  the  numerical 
null  spaces  more  accurately  than  the  URVD,  whereas  the  URVD  yields  a  better  estimate 
of  the  numerical  range. 
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2.7  Total  Least  Squares  Problem 

2.7.1  Problem  Formulation 

The  classical  linear  least  squares  (LS)  problem  is  given  by 


min^  ||Aa:  -  6||,  A  G  Hmxn,  b  G  7 Zm  (2.15) 


or  equivalently, 

min  llell,  e  G  TZm 
6+e€range(j4) 


(2.16) 


where  A  is  data  matrix  whose  rows  contain  measurements  from  the  model  under  con¬ 
sideration,  and  b  is  the  observation  vector.  Here,  we  presume  A  is  free  of  error  and  b 
is  subject  to  error.  However,  it  is  unrealistic  to  take  error-free  measurements  from  the 
model. 

The  total  least  squares  (TLS)  problem  assumes  that  there  are  errors  in  the  data 
matrix  A  as  well  as  the  observation  vector  6.  The  TLS  problem  has  the  formulation 


min 

6+eGrange(J4-f-£') 


\\(E  e)||F, 


(2.17) 


which  is  analogous  to  (2.16). 

The  errors-in-variables  problems  have  a  long  history  in  statistical  literature.  In 
the  field  of  numerical  analysis,  this  problem  was  first  introduced  by  Golub  and  Van 
Loan  [51]  and  then  studied  extensively  by  Van  HufFel  et  al.  [114,  118].  If  eTC  and 
( ETLS  eTLS ^  are  ^  an^  TLS  corrections  to  (2.16)  and  (2.17),  respectively,  it  can 
be  shown  that 

W(ETLS  eTLS^F  ~  II eLS ^ 
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2.7.2  Basic  Solution 

The  TSOD  gives  an  elegant  way  of  solving  the  TLS  problem.  Suppose  the  TSOD 
of  (A  6)  £  7£mx(n+1)  is  given  by 


(A  b)  =  U 


\ 0  / 


VT,  v  = 


k  n-k+1 


where  U  £  7 zmxm  and  V  €  are  orthogonal,  and  M  G  7^(”+1)x(n+1)  has 

one  of  the  forms  (1.2)— (1.5).  Then  V ^  .  •  •>  vn+1) is  a  basis  of  the  noise  subspace 

of  ( A  b ),  and  the  minimum  norm  TLS  solution  is  given  by  computing  a  Householder 
transformation  H  £  such  that 


V2H  = 


Y  d 
0  6 


(2.18) 


If  6  ^  0,  the  TLS  solution  x  is  given  by 


x  =  -d/S.  (2.19) 

See  [118]  for  details.  In  particular,  Golub  and  Van  Loan  formulated  the  solution  based 
on  the  SVD.  Van  Huffel  and  Zha  [119]  also  formulated  the  solution  to  the  TLS  problems 
based  on  the  URVD  and  the  ULVD  without  the  explicit  computation  of  the  approximate 


null  space  basis  V  , 
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2.7.3  Recursive  Total  Least  Squares 


When  one  does  not  have  complete  knowledge  of  the  data  in  a  given  model,  recur¬ 
sive  procedures  make  it  possible  to  achieve  satisfactory  results.  The  algorithm  is  given, 
to  begin  with,  incomplete  knowledge  about  the  environment,  and  modifies  the  processing 
model  in  an  adaptive  fashion  as  data  are  received  sequentially. 

In  recursive  TLS  problem,  it  is  required  to  append  a  new  row  to  the  data  matrix 
A  and  an  observation  to  b,  and  the  new  information  must  be  incorporated  into  the  TLS 
solution.  It  is  also  desirable  to  delete  the  oldest  observation  from  the  existing  data. 

The  updating  and  downdating  problems  of  the  form  (1.7)-(1.8)  are  associated 
with  the  sliding  window  method  [4,  19,  32].  At  each  step  of  the  sliding  window  method 
with  the  window  size  s,  an  s  X  n  data  matrix  is  constructed  from  an  mx  n  observation 
matrix  A  by  adding  a  new  row  to  the  data  matrix  in  the  previous  window  and  deleting 
the  oldest  row  from  it.  In  step  j,  the  row  s  +  j  of  the  observation  matrix  is  added  and 


the  row  j  is  deleted,  giving  the  data  matrix  A.\ 
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An  alternative  approach  is  to  use  an  exponential  forgetting  factor  /?  (0  <  (3  <  1) 
[59].  In  this  approach  the  modified  matrix  in  (1.7)-(1.8)  is  given  by 

(2.20) 

Here,  the  effect  of  old  observation  diminishes  exponentially  as  continuous  updating  is 
required.  However,  the  explicit  removal  of  the  observation,  as  in  the  sliding  window 
method,  makes  it  simple  to  estimate  the  rank  of  modified  matrix,  that  is,  it  remains  the 
same  or  increases  by  one  for  updating,  or  decreases  by  one  for  downdating.  Thus,  an 
indefinite  number  of  condition  estimation  steps  is  not  necessary  for  rank  detection  as 
done  in  [101,  102], 
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Chapter  3 

Basic  Algorithms 

This  chapter  contains  detailed  description  of  basic  algorithms  frequently  used  in 
the  subsequent  chapters.  Throughout  the  dissertation  we  follow  the  convention  of  the 
Matlab  [78]  in  describing  the  algorithms.  Some  of  the  algorithms  include  those  for 
computing  and  applying  a  Givens  rotation,  various  chasing  algorithms,  and  the  LIN- 
PACK  [37]  downdating  algorithm.  We  also  give  a  special  treatment  for  2-by-2  updat¬ 
ing/downdating  procedures. 

3.1  Givens  Rotations 

This  section  contains  simple  routines  for  constructing  and  applying  Givens  rota¬ 
tions  described  in  Section  2.4.2. 

Algorithm  3.1.  This  function  computes  complex  absolute  value  of  x  +  iy,  that  is, 

t  =  JJW. 

function  t  =  cabs(x,y) 
if  |x|  >  |y|  then 

r  <—  y/x ;  t  <—  |x|  *  \f  1  +  r ^ 
else 

r  *-  x/y;  t  *-  |y|  *  yl  +  r2 

endif 


end 
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Algorithm  3.2.  This  function  computes  c  =  cos(O)  and  s  =  sin(0)  for  some  6  such 
that 


l  c 


\T/ 


C)  \ 


\ 


a 


cabs(a,  b) 


0 


J 


procedure  formrot(a,  6,  c,  s) 
if  |a|  >  |6|  then 

r  <—  6/a;  factor  <—  \fl  + 
a  <—  |a|  *  factor-,  c  <—  1/ factor;  s  *—  r  *  c 

else 

r  a/6;  factor  <—  \/l  + 
a  <—  |6|  *  factor ;  s  «—  1/ factor;  c  <—  r*  s 

endif 

end 


Algorithm  3.3.  This  procedure  applies  the  plane  rotation  defined  by  c  and  s  to  rotate 

,  T  TfT 

(x  ,y  )  . 

procedure  applyrot(a;,  y,c,s,n) 
temp  <—  c  *  ®(1:  n)  +  s  *  y(l:  n) 

y(l:  n)  < - s  *  x(l:  n)  +  c  *  y(  1:  n) 

x{l\n)  <—  temp 


end 
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3.2  Chasing  Algorithms 

3.2.1  A  Chasing  Routine  for  a  Bidiagonal  Matrix 

The  following  routine,  for  a  given  vector  z  and  a  diagonal  matrix  B,  finds  orthog¬ 
onal  matrices  U  and  V  such  that 

B  =  UTBV,  VTz  =  pev  p=\\z\\  (3.1) 


where  B  is  lower  bidiagonal. 

Fig.  3.1  illustrates  the  reduction  steps.  The  right  arrows  — »  denote  rotations 
from  the  left  on  two  particular  rows,  whereas  the  down  arrows  H  denote  rotations  from 
the  right  on  two  particular  columns.  Here  z  denotes  elements  in  the  vector  z ,  b  denotes 
elements  in  B,  and  z  or  b  denotes  an  element  about  to  be  zeroed  out.  We  now  formally 
present  the  algorithm  below. 

Algorithm  3.4  (Chasing  algorithm  for  bidiagonal  reduction).  Given  a  diag¬ 
onal  matrix,  B  =  diag(7(l:  n)),  and  a  vector  to  be  reduced,  z(l:  n),  the  following  chasing 
scheme  produces  a  lower  bidiagonal  matrix  B  such  that  B  =  bidiag(7(l:  n),  n  -  1)) 
and  satisfies  (3.1). 

procedure  forchase(7,  0,  z,n) 
formrot  (z(n  —  1),  z(n),  cn,  sn ) 

e  < - sn  *  7(71  —  1);  7(77  —  1)  <—  cn  *  7(71  —  1) 

4>(n  —  1)  sn  *  7(71);  7(71)  <—  cn  *  7(71) 
formrot  (7(71),  e,  cn,sn) 
applyrot  (cn,sn,<f)(n  —  l),7(n—  1),1) 
for  i  « —  n  —  2, . . . ,  1 


formrot  ( z(i ),  z(i  +  1),  cn,  sn) 
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Fig.  3.1.  Forward  Chasing  Procedure  for  the  Bidiagonal  Reduction 


35 


e  < - sn  *  7(1);  7(1)  *-  cn*  7 (i) 

( j){i )  sn  *  7  (i  +  1);  7  (i  +  1 )  <—  cn*  7  (i  +  1) 

d  *—  cn*  4>{i  +  1);  <f>(i  +  1)  <—  sn  *  (f>(i  +  1) 
formrot  (7 (i),  e,  cn,  sn) 
applyrot  (cn,sn,<f>(i),j(i  +  1),  1) 
applyrot  (cn,  sn,  d,  4>(i  +  1),  1) 
for  j  <—  i, . . . ,  n  —  3 

formrot  (<t>(j),  d,  cn,  sn) 
applyrot  (cn,  sn, 7 (j  +  1),  4>{j  +  1),  1) 
e  <-  sn*  7  (j  +  2);  7  (j  +  2 )  <-  cn  *  7  (j  +  2) 
formrot  (<£(j),  d,  cn,  sn) 
applyrot  (cn, sn,  j(j  +  1),  <j>(j  +  1),  1) 
d  *-  sn*  <f>(j  +  2);  <j)(j  +  2 )  *—  cn*  <f>(j  +  2) 
endfor 

formrot  {<f>{n  —  2),  d,  cn,  sn) 
applyrot  (cn,  sn, 7(n—  l),<^>(n  —  1),1) 
e  <—  sn  *  7(n);  7(n)  <—  cn*  7  (n) 
formrot  (7(n  —  1),  e,  cn,  sn) 
applyrot  (cn,sn,<f>(n-  l),7(n),  1) 
endfor 
end 


This  algorithm  constructs  and  applies  n 


2 


—  n  Givens  rotations.  A  similar  proce¬ 


dure  backchase  to  that  illustrated  above  can  be  used  to  produces  orthogonal  matrices 
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U,VeTlnxn  such  that 


B  =  UTBV,  VTz  =  pen,  p  =  \\z\\ 


(3.2) 


where  B  is  upper  bidiagonal. 

For  the  sake  of  brevity,  we  do  not  present  backchase.  It  is  computed  by  reversing 
the  two  vectors  7(1: »)  and  1),  performing  forchase  and  reversing  the  vectors 

back.  The  algorithm  would  just  have  the  loop  in  forchase  go  backward  instead  of 
forward. 

3.2.2  A  Chasing  Routine  for  a  Lower  Triangular  Matrix 

We  now  describe  a  simple  chasing  routine  for  a  lower  triangular  matrix.  This 
routine,  for  given  a  vector  z  and  a  lower  triangular  matrix  C,  finds  orthogonal  matrices 
U  and  V  such  that 

C  =  UTCV,  VTz  =  pev  p  =  \\z\\  (3.3) 

where  C  is  lower  triangular. 

Consider  the  4x4  case  in  Fig.  3.2.  We  state  the  procedure  lchase  formally  below. 

Algorithm  3.5.  Given  a  lower  triangular  matrix  C  and  the  updating  vector  z,  this 
procedure  performs  a  chasing  operation  on  C ,  and  produces  a  lower  triangular  matrix 
C  that  satisfies  (3.3). 

procedure  lchase(c,  z,  n) 
for  i  *-  n  —  1, . . .,  1 

formrot  (z(i),z(i+  l),cn,sn)] 

e  * - sn  *  c(i ,  i);  c(i ,  i)  +-cn*  c(»,  i); 

applyrot(c(i  +  1:  n,i),c(i+  1:  n,i  +  l),cn,  sn,n—  i); 
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Fig.  3.2.  Chasing  Steps  for  a  Lower  Triangular  Matrix 


formrot  (c(i  +  1,  i  +  1),  e,  cn,  sn )  ; 
applyrot(c(i  +  1,1:*),  c(i,  1:  i),  cn,  sn,  i)\ 

endfor 

end 


A  similar  chasing  procedure  can  be  specified  for  an  upper  triangular  matrix  when 
modifying  the  URVD.  Stewart  [102]  points  out  that  if  the  matrix  C  is  from  a  rank 
revealing  decomposition  with  k  large  rows  and  n  —  k  small  rows,  this  algorithm  can  yield 
k  +  1  large  rows,  thus  the  rank  revealing  nature  of  C  may  be  lost. 

3.3  qd  Procedure 

It  is  easy  to  find  an  orthogonal  matrix  Q  as  a  product  of  Givens  rotations  such 

that 


B  =  QB 
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where  B  is  lower  bidiagonal  and  B  is  upper  bidiagonal.  This  is  the  same  as  one  unshifted 
Fernando-Parlett  qd  step  [40],  hence  the  name.  Fig.  3.3  illustrates  the  reduction  steps 
for  a  4  x  4  case. 
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(b  b  \ 
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b  b 

-*  [  k 

b  b 

b  b 

b  b 

-  1  b  b 

-  b 
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b  b) 

\  b  b) 

-V  b  b) 

V  b) 

Fig.  3.3.  One  Step  of  qd  Procedure 


Algorithm  3.6.  This  procedure  produces  the  orthogonal  factorization  of  a  lower  bidi¬ 
agonal  matrix.  7(1:  n )  is  the  diagonal  on  input  and  output.  (j>(l:  n  —  1)  is  the  subdiagonal 
on  input  and  the  superdiagonal  on  output. 

procedure  qd  (7,^,  n) 
for  i  «—  1, . .  .  ,n  —  1 

formrot(7(i),  cn,  sn) 

(f)(i)  <—  sn  *  7 (i  +  1) 

7 (i  +  1)  <—  cn  *  7  (i  +  1) 
endfor 


end 
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3.4  The  LINPACK  Downdating  Procedure 

The  following  downdating  procedure  due  to  Saunders  [46]  is  considered  the  most 
accurate  downdating  procedure  that  does  not  require  information  from  the  first  row  of 
U  in  (1.1)  [20]  (if  we  have  that  first  row,  we  obtain  a  procedure  that  is  backward  stable 
in  the  strong  sense).  It  is  the  procedure  that  is  implemented  in  the  LINPACK  [37]. 

Algorithm  3.7  (The  LINPACK  downdating  procedure).  Given  M  e  TZnXn  and 

z  e  TZ  ,  this  algorithm  computes  the  downdated  matrix  M,  that  is,  M  M  —  M  M  - 

T 
zz  . 

Step  1.  Solve 

M^a  =  z  (3.4) 

T  T 

If  ||a|j  >  1  declare  M  M  —  zz  indefinite  and  stop.  Otherwise  go  to  Step  2. 

Step  2.  Compute  a  =  \fl  -  ||a||2  and  Q  =  •  •  -  Q^  £  7^(”+1)x(n+1)  a  prod¬ 

uct  of  Givens  rotations,  =  J(l,t+  1,0.),  i  =  l,...,n  such  that 


Step  3.  Compute 


Q 


/  \ 
a 


\  a  ) 


=  er 


Q 


(  .  \ 


\mj 


{  T  \ 


\M  , 


We  note  that  if  M  is  upper  or  lower  triangular,  it  is  simple  to  choose  Q  as  a  product 
of  Givens  rotations  •  •  •> Qn  so  that  M  remains  upper  or  lower  triangular.  Pan 

[85]  shows  that  for  M  upper  triangular,  this  method  can  be  sped  up  by  combining  the 
forward  substitution  phase  with  the  application  of  the  Givens  rotations. 
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3.5  2x2  Updating/Downdating  Procedure 

The  following  algorithm  computes  orthogonal  matrices  G  =  7(1,2,^  ),  G  = 
J(l,  3,  <f>  )  £  7 for  some  <f>.,  i  =  1,2,  such  that 

a  ^ 


\ 

\ 

0  0 

£  P 

a  6 

a  6 

^  0  C  ) 

Algorithm  3.8  (2x2  Updating).  Given  scalars  a,  6,  c,  p,  this  algorithm  computes 
scalars  a,  6,  c  of  an  updated  matrix  defined  in  (3.5). 

function  [2,6,  c]  =  up22  (a,6,c,£,p) 
a  =  cabs(a,  £);  a  =  a/a;  £  =  £/a 
b  —  ab  —  £p;  c  ~  ap  —  £b 

end 


Similarly  we  give  an  algorithm,  which  is  based  on  Algorithm  3.7  for  downdating  a 
2x2  matrix,  that  is,  computing  orthogonal  matrices  Q  =  J(l,3,^),  Q 2  =  J(l,2,02)  G 


72.3x3  such  that 


\ 

\ 

1 

£ 

P 

a 

0 

0 

0 

a 

b 

T  T 

0 

a 

6 

0 

C  ) 

l7 

0 

C  ) 

(3.6) 


where  we  solve 


\ 

T 

(  ,  \ 

a 

b 

P 

-  4 

C  / 

l7; 

U  i 

«=\W-7, 


(3.7) 
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where  J(i,k,6)  is  a  Givens  rotation  in  the  (i,fc)  coordinate  plane  for  some  6. 

Algorithm  3.9  (2x2  Downdating).  Given  scalars  a,  6,  c ,  /?,  and  7,  this  algorithm 
computes  entries  of  downdated  matrix  defined  in  (3.6),  namely,  2,  6,  and  c. 

function  [2,6,c]  =  down22  (a,6,c,/?,7) 

5  = 

a  =  y/a  +  7-^/5  —  7 
p  =  7c/S 

c  =  ac/a;  a  =  aa;  b  —  ab  —  p(3 


end 
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Chapter  4 


Modifying  the  ULV  Decomposition 


4.1  Introduction 


We  give  methods  for  modifying  the  ULVD.  Rewrite  the  ULVD  of  A  G  TZmxn  0f 


(1.1)  as 


A  =  U 


1 c ' 


\0/ 


VJ 


(4.1) 


where  U  €  7 imxm  an(j  y  6  TZnxn  are  orthogonal,  and 


fc  p-fc  n-p 


( L  0 


C  = 


F  G  0 


P~k 


(4.2) 


0  0 


n—p 


so  that  our  algorithm  of  ULVD  separates  out  columns  that  are  exactly  zero.  Here  C 
takes  the  position  of  M  in  (1.1). 

We  also  rewrite  z  of  (1.9)  as 


(A 


k 


Z 


=  VTr  = 


y 


p-k 


n—p 


(4.3) 
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As  in  the  updating  routine  of  Stewart  [102],  the  matrices  C  and  V  can  be  produced 
using  O(n)  Givens  rotations,  thus  updating  the  factorization  in  0(n  )  operations.  Our 
approach  to  downdating  the  ULVD  (4.2)  uses  ideas  from  chasing  algorithms  [102]  and 
from  the  downdating  algorithm  due  to  Saunders  [46,  85]. 

The  following  are  the  main  results  of  this  chapter: 

1.  A  blockwise  procedure  for  downdating  the  ULVD  that  yields 


(  L  0  0  \ 


C  = 


F  G  0 


V 


0 


0  0 


(4.4) 


where 

IK*  <3)11  <  IK*  G) II,  ||(*  <3)||f<||(*  g)\\f 

and  L  and  G  are  lower  triangular,  and  the  blocks  are  conformal  with  (4.2). 

T  T 

2.  A  downdating  algorithm  that  works  whenever  L  L  —  xx  is  positive  definite,  the 
same  as  if  we  were  downdating  only  L.  Our  technique  maps  back  onto  the  original 
matrix  A  in  a  more  satisfactory  manner  than  the  technique  given  by  Park  and 
Elden  [87]. 

3.  An  error  analysis  of  this  procedure  showing  that  the  singular  subspaces  of  the  up¬ 
dated  matrix  are  as  good  as  can  be  expected.  We  also  give  some  new  perturbation 
results  showing  that  the  condition  of  the  downdate  is  related  only  to  the  L  block 
in  (4.4).  Thus  tracking  the  ULVD  is  a  very  stable  method  for  tracking  subspaces. 

Our  ULVD  downdating  algorithm  is  proposed  in  detail  in  Section  4.2.  Section  4.3 
gives  an  error  analysis  of  the  algorithm.  The  accuracy  of  the  computed  subspaces  for 
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large  and  small  singular  values  is  assessed.  In  Section  4.4,  we  give  numerical  tests  of  our 
algorithm  on  recursive  total  least  squares  problems. 


4.2  A  Procedure  for  Downdating  ULV  Decomposition 

4.2.1  Description  of  the  Algorithm 

We  introduce  the  following  algorithm  for  downdating  the  ULVD.  Our  procedure 

T  T 

produces  a  downdated  matrix  C  if  for  L  and  x  defined  in  (4.2)  and  (4.3),  L  L  —  xx  is 
positive  definite. 


Algorithm  4.1  (Procedure  for  ULVD  downdating).  Given  a  lower  triangular  ma¬ 
trix  C  of  the  form  (4.2)  and  a  vector  z  of  the  form  (4.3),  this  algorithm  finds  a  lower 
triangular  matrix  C  of  the  form  (4.4)  and  orthogonal  matrices  U  and  V  satisfying  (1.12). 
Initially,  V  =  V.  The  components  in  yQ  are  ignored  (will  be  justified  by  Proposition  4.2). 
Throughout  the  description  of  this  algorithm  L ^  and  G^  denote  lower  triangular  ma¬ 
trices,  /j*)  =  [F^]Tev  and  =  e^[G^]er 

Step  1.  Compute  orthogonal  matrices  P  ,  V  E  7 such  that 


G{1)  =  ufGVv  V^y  =  pe[P~k\  p=||y||. 


Also,  compute 

F1'1  =  uJf. 

Update  V  -  diag(/l+1 ,  £?, , 7^;  V  *- V diag (4,  V, ,  /,_,). 
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Step  2.  Find  an  orthogonal  matrix  E  such  that 

' £(1)  h  \  uT(  L  °  ^ 

0  a®  ZZU‘1  <7(1) 

0  5n  /  \  *  9n 

Define  F ^  =  (/— e  )jF^,  that  is,  as  F1^  with  its  first  row  set  to  zero.  Update 
U  «—  U  diag(l, 


Step  3.  Use  Algorithm  3.7  to  find  a  vector  a  E  H  ,  scalar  a,  and  orthogonal  matrix 
U  E  ^*+1>x^+1>  such  that 

o 


Then  compute 


Step  4.  Compute 


if  p  <  then 
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else  <  p 


/  A  X 
P- op 


I  „(2)  \ 


=  ui 


'11 


V  n  ) 


»u  =  0 


(4.6) 


where  bp  —  p  —  +  aFh). 


endif 


Define  1/  =  J(l,k+  1,0)  as  the  Givens  rotation  with  cn  =  cos(0),  sn  =  sin(0) 
such  that 


cn  =  < 


S(3)/9<2) 
yn  'y\\ 

o 


if  #  o 

if  =  0. 


Update  U  «-  UU.  diag (U.,I  A 

4  u  fl — fc 


Step  5.  Find  an  orthogonal  matrix  V 2  G  such  that 

(X(3)  0)  =  (Z<2)  he[)V-2 
(ir(3)  G(4))  =  (f(2) 

Update  V  «-  V  diag(V’2,/n_jfc_1).  If^  #  0, 


and  go  to  Step  7;  otherwise,  go  to  Step  6. 
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Step  6.  If  =  0  then  =  0  also  since  it  was  formed  from  using  V  .  Thus 
11  1  1 1  z 


p-k 


k  P~k 

/>  0  'l 

jX3)  q(4) 


k  P-k 

l  f  \ 

L  0 

0  0 

\F  5<4)/ 


k 

1 

p—k—1 


where  is  a  lower  Hessenberg  matrix.  We  then  find  an  orthogonal  matrix 

V  E  TZ^P~^X^P~^  and  an  orthogonal  permutation  matrix  U  E  7Z^P  k)x(P  k) 
3 


such  that 


£LtG<%,  = 

5  6 


<G  o' 


0  0 


p—k—1 

1 


Update  U 
form 


U  dmg(Ik+vUb,In_p)]V  «-  Vdiag (Ik,V3,In_p).  Thus  C  has  the 


k  p-k-1  n-p+1 

/-  -  -  \ 


c  = 


X  0 
F  G 
0  0 


0 

0 

0 


k 

p—k—1 
n— p+1 


Step  7.  Perform  a  ULVD  of  L  to  determine  its  numerical  rank.  If  we  have  deter¬ 
mined  the  rank  of  L  correctly,  the  rank  of  L  should  be  k  or  k- 1.  Make  appropriate 
adjustments  to  F  and  G. 


We  give  an  expression  for  U  in  the  statement  of  the  algorithm,  but  our  analysis 
assumes  that  the  left  orthogonal  matrix  U  is  not  saved.  Although  it  is  not  computed  by 
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the  algorithm,  for  the  discussion  that  follows  we  need  to  define  the  vector  z  by 

k 

p-k  .  (4.7) 

Tl—p 

Algorithm  4.1  requires  life2  +  6(p-fc)2  +  12k(p  -  k)  +  O(p)  flops  for  Steps  1-5  and 

2 

4 k  +  0(k )  flops  for  Step  6.  When  V  is  modified,  additional  6n(p  —  k)  +  6nk  flops  will 
be  required.  The  deflation  step  requires  about  uk  +  16 nk  flops,  where  v  is  the  constant 
depending  upon  the  condition  estimators  used  [61]. 

Fig.  4. 1-4.2  illustrate  the  action  of  our  algorithm  on  a  6  x  6  matrix  with  k  =  3. 
Here  l,  g,  and  /  denote  components  of  L,  F  and  G,  and  x,  y,  and  h  are  components 
of  those  vectors.  Rightarrows  and  downarrows  denote  premultiplication  and  postmulti¬ 
plication  by  orthogonal  transformations  from  the  left  and  from  the  right,  respectively. 
Here  p ,  h  and  h  have  meanings  consistent  with  those  in  the  statement  of  Algorithm  4.1. 

A  set  of  — ►  denotes  the  applications  of  and  1/  in  Steps  3  and  4  when  downdat- 
ing  L  and  5^  *  ^eP  ®  illustrated  in  Fig.  4.2.  Note  that  for  Steps  3-5,  the  components 
x  and  y  have  a  different  interpretation  from  Steps  1  and  2  above,  they  are  now  the  values 
“downdated.”  To  illustrate  their  action  properly,  U  and  U  are  given  in  reverse  order 

O  i 

rather  than  the  order  that  they  are  actually  applied. 

Remark  4.1.  This  algorithm  works  if 


-T 

z  =  Vz  = 


l  \ 

x 


\  yo  } 


p—k  n~p 

'a  o' 

0  0 


p—k 

n-p 


i 

l 

l 

l 

/X 

X 

X 

y 

y 

y\ 

(X 

X 

a 

y  y 

°\ 

/ XXI 

y 

y 

°\ 

/ 

l 

l 

l 

l 

l 

l 

l  l 

l 

l 

/ 

l 

l 

/ 

l  l  l 

f 

f 

/ 

9 

f 

f 

/ 

9 

f  f 

9 

f 

f 

f 

9 

9 

— ► 

f 

f 

/ 

9  9 

9 

f  f 

9 

9 

\f 

f 

f 

9 

9 

\f 

f 

/ 

9  9 

9 / 

\f  f  J 

9 

9 

9> 

(x 

X 

X 

p 

0 

°\ 

(X 

X 

a 

P  o 

°\ 

/  X  X 

X 

p 

0  0\ 

l 

l 

1 

l 

l 

l 

l 

l 

l  l 

l 

/ 

l 

l 

l 

l  l 

l 

h 

f 

/ 

f 

9 

9 

f 

f 

/ 

9 

- 

f  T 

0 

9 

f 

/ 

f 

9 

9 

f 

f 

/ 

9  9 

f  f 

/ 

9 

9 

\f 

/ 

f 

9 

9 

9  / 

\f 

f 

/ 

9  9 

9> 

\f  f 

/ 

9 

9  9> 

fx 

X 

X 

P  o 

°\ 

(X 

X  X 

p 

0  0\ 

— ► 

l 

l 

h 

l 

l 

h 

l 

l 

h 

l 

l 

l 

h 

l 

l 

h 

f 

0 

0 

9 

0 

0 

9 

f 

/ 

/ 

9  9 

/ 

/ 

9 

9 

\f 

/ 

/ 

9  9 

9 ) 

\/ 

/ 

9 

9  9/ 

(a)  Steps  1  and  2 

l  l  l  i 


* 

- ► 

f° 

0 

0 

0 

0 

(  X 

X 

X 

y 

y 

y \ 

fx 

X 

X 

y 

y 

y\ 

A 

l 

h 

l 

h 

l 

* 

l 

/ 

h 

l 

l 

h 

l 

l 

h 

* 

l 

/ 

l 

h 

l 

l 

/ 

h 

l 

l 

l 

h 

* 

0 

0 

0 

9 

0 

0 

0 

9 

f 

0 

0 

9 

/ 

/ 

/ 

9 

9 

/ 

/ 

/ 

9 

9 

f 

/ 

/ 

9 

9 

\f 

/ 

/ 

9 

9 

9 ) 

\/ 

/ 

/ 

9 

9 

9/ 

\f 

/ 

/ 

9 

9 

<// 

i 

l 

(X 

X 

X 

y 

V 

»\ 

(* 

X 

X 

y 

y 

y\ 

l 

l 

l 

l 

l 

l 

l 

l 

l 

h 

l 

l 

l 

f 

f 

0 

9 

f 

f 

f 

9 

f 

f 

/ 

9 

9 

k  / 

f 

f 

9 

9 

\f 

f 

/ 

9 

9 

0/ 

\/ 

f 

f 

9 

9 

9' 

(b)  Steps  3-5 


Fig.  4.1.  Reduction  Steps  for  Downdating  the  ULVD 
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Fig.  4.2.  Reduction  Steps  When  G  Becomes  Singular 


T  T  T 

is  substituted  for  G  and  ( y  yQ  )  is  substituted  for  y.  That  is,  it  is  not  necessary 

to  explicitly  handle  the  zero  block,  it  can  be  made  part  of  G.  That  is  the  original 

formulation  in  [102].  However,  if  that  is  done,  whenever  yQ  ^  0,  any  zero  diagonal  will 

T  T  T 

be  chased  to  the  position,  all  of  (y  y  )  will  be  treated  as  “noise”.  If  ||G||  >  fi, 
as  is  often  true  in  practice,  then  ||y||  is  possibly  significant  whereas  can  only  result 
from  computing  errors  from  computing  (4.2)  or  multiplying  z  —  V  r.  Our  formulation 
neglects  part  of  y  only  if  the  downdate  of 


L 

F 


with  (x  y  ) 


cannot  be  done. 


In  updating,  there  is  no  similar  benefit  to  separating 


out  the  zero  block. 
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Remark  4.2.  We  note  that  (4.6)  in  Step  4  is  equivalent  to  computing 


gfi  +  (Sp)a 

h  + ( Sp)a 


f>P  =  P~  +  aT h) 


~T  T  T  ~ 

since  U  maps  the  additive  noise  vector  ( Sp)(a  a  )  into  (Sp)e  .  Only  h  is  of  interest 

O  1 

in  the  computation. 

C  satisfies  (4.4)  as  is  proven  quickly  in  the  following  proposition. 

Proposition  4.1.  For  the  matrix  C  resulting  from  Algorithm  4-1  we  have  (4-4). 

Proof.  This  proof  is  a  straightforward  consequence  of  facts  about  orthogonal 
transformations.  Every  step  except  two  and  four  either  does  not  affect  F  and  G  or  just 
multiplies  them  on  the  left  or  right  by  orthogonal  transformations,  thus  for  those  steps 
we  need  only  invoke  orthogonal  invariance. 

For  Step  2,  we  have  that 


-e  eTP(1) 

er  i  ’ 


Thus  ||(.F^  G^)||^  <  IK-F1^  G^)||^  follows  from  the  fact  that  F ^  and  G^  are 
no  larger  than  and  G^  componentwise.  For  the  Euclidean  norm,  we  note  that  for 


any  vector 


r„<»  \  * 

^  U(2f  J  P~k 
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thus 


||(F(1)  G(1)H|2  =  \\(F{2)  G^)v\\2  +  ({f(l)}Tv{1))2 


> 


,  ,,  (1),2  f  (2),2V  T  (2)x2 

+  ({%'}  “  {9n  }  )(e1vV  ') 

||(F(2)  G(2))u||2 


for  all  v  €  7?”.  A  simple  argument  on  the  definition  of  the  Euclidean  norm  yields  the 
inequality  ||(.F^  G^)||  >  ||(.F^  G^)||. 

Step  4  clearly  produces  |<7^|  <  The  same  argument  as  above  yields 


IK^2)  G(3))||  <  ||(F(2)  g(2))|| 


and  likewise  for  the  Frobenius  norm.  Thus  we  have  (4.4).  □ 

In  the  absence  of  rounding  error,  we  can  show  that  the  “additive  noise”  in  Step  4 
satisfies  an  important  consistency  property.  We  assume  that  C  is  orthogonally  equivalent 
to  A  -f-  6  A  where  6 A  consists  of  errors  from  previous  orthogonal  transformations. 

PROPOSITION  4.2.  Assume  that  Algorithm  4-1  is  performed  in  exact  arithmeticf  that  U 

~  ~  _  ~T 

and  V  in  (4-1)  are  exactly  orthogonal  Let  U  and  V  be  as  in  (LIS)  and  let  z  =  V  r  be 

~  T 

as  in  (4- 7).  Assume  that  Ue ^  =  e^,  and  that  z  =  V  r  is  computed  exactly.  Also  assume 
that  C  satisfies  (4-1)  with  backward  error  bA,  that  is 


A  +  SA 


T  r  T  \ 

„  \ 

r  +6r 

c 

=  U 

^  A  +  3A  J 

l°] 

(4.8) 
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Then  z  and  C  satisfy 


A  +  8Aq  = 


(  T  \ 

T  \ 

z 

r 

=  £/■ 

C 

^  A  +  bA  j 

\  0  / 

(4.9) 


Thus  \\8A0\\  =  \\6A\\  <  \\SA\\  and  the  same  result  holds  for  the  Frobenius  norm. 


Proof.  We  have  that 


UJ 


1 c ' 


V  "  / 


iMlk,vvi  )  = 


Z  +  &C 

(P  ~ 

0 

jr(2) 

he ? 

0 

F(2) 

g(3) 

0 

We  also  have  that 


/  -T  \ 


) 


*  per  yQ 
(2)  r  T 


T  \ 


L he^  0 
jp(2)  G(3)  0 

0  0  0 
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Let  Vj  =  V  diag(/fc,  V^,In  )  then  from  the  definition  of  V  in  Algorithm  4.1  it  follows 
that 


/ 


U 


(  t\ 

c 


\  0  I 


~  t  ~ 
Vj  =  u 


T  T 
x  pe  y 


1  *0 


^  he 


0 


J’C2)  0 

0  0  0 


diag^.vf,/  )Fr.  (4.10) 


However,  we  have 


U 


^  C  ^ 


V0/ 


V  =  u 


(  T  \ 

x  +  Sx  (p  —  6p)e j  0  ' 

X(2)  he^  0 


F(  2) 

0 


g(3)  0 

0  0 


(4.1D 


=  A  +  6A  = 


(  T  ,  ,  T 
t  +  or 

A  +  6A 


(4.12) 


Comparing  equations  (4.10)  and  (4.11),  using  the  fact  that 


v  "»*('*•  V »-») 


/  \ 

X 

\  »o  ) 


=  r 


and  the  assumption  that  Ue  =  e^,  we  have  (4.9).  □ 

Therefore,  we  have  shown  that  the  additive  noise  in  Step  4  and  the  act  of  ignoring 
yQ  actually  make  the  matrix  C  closer  to  being  orthogonally  equivalent  to  A  than  C  is  to 
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A.  In  Section  4.3,  we  show  that  the  results  of  Propositions  4.1  and  4.2  make  Algorithm 
4.1  a  robust  algorithm  for  tracking  the  ULVD. 

4.2.2  Relation  to  Park  and  Elden’s  URV  Procedure 

A  recent  report  by  Park  and  Elden  [87]  gives  a  method  for  downdating  the  URVD. 
For  that  algorithm,  we  are  downdating  an  upper  triangular  matrix  M  of  the  form  (1.4). 
The  procedure  is  as  follows. 

1.  Find  an  orthogonal  matrix  U ^  and  an  upper  triangular  matrix  R  such  that 


T  \ 

_  \ 

X 

0 

=  u. 

1 

I 

R  ) 

Determine  S  and  y  such  that 


(?) 


\ 


(4.13) 


(4.14) 


2.  Find  an  orthogonal  matrix  U  and  an  upper  triangular  matrix  T  such  that 


(  ~T  \ 

n  \ 

y 

0 

=  un 

2 

f  j 

KT) 

The  downdated  matrix  M  is 


(4.15) 
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as  before. 

Park  and  Elden  [87]  recommend  the  use  of  hyperbolic  rotations  in  (4.14).  That 

rp  rp 

can  be  avoided  by  a  simple  and  well-known  trick.  Let  (c^  a  )  be  the  first  column  of 
as  determined  in  (4.13).  Then  we  note  that 


/  \ 


y  =  (y  s  ) 


a. 


which  implies  that 


y  =  (*l  1(y-  STal). 


Once  y  is  determined,  we  obtain  5  from 


<  T\ 
\  ^  ) 


=  u. 


( f\ 

\ 5  / 


71  _ ... 

If  T  T  —  yy  is  indefinite,  Park  and  Elden  substitute  for  (4.15)  the  operation 


Ui 


\ 

T  \ 

0 

pe 

V  = 

n 

\T> 

T 

1  / 

-T  ~ 

where  V  is  an  orthogonal  matrix  such  that  V  y  =  pe  and  so  that  T  is  triangular.  They 


then  compute 


{  -T  \ 

y 

\  *  I 


V  5  / 


v. 
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where  y  =  V  y.  The  downdated  matrix  T  =  T  except  for  t  ,  q  = 

qq'  H 

satisfies 


n  —  k.  That  entry 


X\tqq\<\p\ 

otherwise  . 


T  T 

The  first  condition  is  identical  to  the  case  where  T  T  -yy  is  indefinite. 


Remark  4.3.  Algorithm  4.1  is  related  to  the  above  algorithm  although  the  essential 
steps  of  them  were  developed  independently.  Steps  1  and  2  of  Algorithm  4.1  reduce  the 
ULVD  downdating  problem  to  that  of  downdating  the  (fc  +  1)  x  (A:  +  1)  upper  triangular 
matrix 

k 

<Lw 

0 

The  Park-Elden  algorithm  applied  to  this  matrix  would  perform  Steps  3  and  4  of  Algo¬ 
rithm  4.1  except  that  it  would  perform  Step  4  according  to 


jr(2) 


(4.16) 


„  \ 

\ 

9 

P 

\  h  j 

K  *  ) 

(4.17) 


instead  of  according  to  (4.6).  The  consistency  property  in  Proposition  4.2  does  not  hold 
if  we  use  (4.17).  That  can  be  seen  when  (4.17)  is  placed  into  the  proof.  Equation  (4.10) 
does  not  change,  but  we  do  have 


7(2) 

'll 


P  ~  [<5p]<* 

^  h  -  [Sp\a  t 


(4.18) 
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~  ~  (2)  —1 

where  Sp  =  p  -  gK^J  =  a  Sp  using  terminology  from  Algorithm  4.1.  We  note  that  the 
additive  noise  is  multiplied  by  a  factor  of  a-1  >  1,  but  that  is  not  crucial.  Equation 
(4.18)  implies  that  equation  (4.11)  in  the  proof  of  Proposition  4.2  is  replaced  by 

x  +  8x  (p  —  [8p\a)e^  0 
X(2)  (h  -  [i 8p\a)e *  0 
F(2)  G(Z)  o 

0  0  0 

Thus  the  consistency  property  of  Proposition  4.2  does  not  hold. 

The  Park-Elden  URVD  algorithm  requires  5k?  +  5(n—k)^+8k(n  —  k)  +  0(n)  flops. 
This  algorithm  is  fewer  flops  than  is  required  to  downdate  the  ULVD  by  Algorithm  4.1, 
but  there  is  an  important  advantage  to  maintaining  the  ULVD  instead  of  the  URVD. 
Both  the  URVD  and  the  ULVD  will  tend  to  produce  a  V  matrix  such  that 

V  =  (Vj  v2)>  Vi  6  Knxk,  V2  e  7Znx^n~k') 

where  range( )  and  range(V2)  are  approximations  to  the  subspaces  associated  with  the 
first  k  and  last  n  —  k  singular  values  of  A  respectively.  However,  as  found  by  Fierro  and 
Hansen  [43],  if  F  in  (4.2)  and  S  in  (1.4)  satisfy  ||F||  a  ||S||  and  if 

inf  ||.Rz||  >  e  >  max  ||Gty||, 

11*11=1  IMI=1 


then  the  ULVD  yields  a  more  accurate  approximation  of  the  desired  subspaces  of  A. 
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4.3  Error  Analysis 

4.3.1  Error  Bounds  on  Algorithm  4.1 

~  (2)  . 

We  begin  by  showing  that  if  p  <  5^'  in  Step  4  and  yQ  =  0,  then  Algorithm  4.1 
produces  a  matrix  C  such  that 

VTCTCV  ~(z  +  Sz)(z  +  Szf  =  (C  +  Scf(C  +  SC)  (4.19) 


for  some  orthogonal  matrix  V.  This  is  the  so-called  mixed  stability  criterion  defined  in 

~  (2) 

Definition  2.7.  If  in  Algorithm  4.1,  yQ  ^  0  or  p  >  ,  then  we  are,  in  fact,  producing 

C  such  that 


VTCTCV  -  (z  +  Sz  +  Sz  )(b  +  Sz  +  Sz  f  =  (C  +  SC)T (C  +  SC) 


for  some  orthogonal  matrix  V.  In  the  context  of  Algorithm  4.1  , 


‘  M'i+i  N 

V  "o  / 


(4.20) 


(2)  T 

where  6p  =  p  —  7  +  a  h ).  Note  that 


ll^oll2  =  IVI2  +  ll®0l|2  =  \p-  («5|2)  +  »T'*)|2  +  ll!/0l|2. 


We  note  that  V  =  I  ,  if  Step  6  is  not  done.  The  effect  of  Sz  is  discussed  in  Appendix 

o  p  /C  U 

B  of  [13].  It  is  essentially  the  same  as  the  effect  of  Sb  as  bounded  in  Proposition  4.3. 
The  analysis  in  this  section  will  ignore  that  effect,  it  will  assume  that  we  are  analyzing 
the  problem  (4.19). 
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We  define  the  scaling  matrix 

A  =  dia.g(4,||G||||r1||/n_t).  (4.21) 

We  expect  the  rounding  errors  from  Algorithm  4.1  to  be  columnwise  proportional  to 
diagonals  of  A. 

THEOREM  4.1.  Let  Algorithm  4-1  be  performed  on  a  matrix  C  of  the  form  (f.2)  in 
floating  point  arithmetic  with  machine  unit  p.  Then  there  exist  orthogonal  matrices 
U  €  ft(n+1)x(n+1)  ,V  £  HnXn  such  that 


where 


'  z+  6z 
1  C  +  6C 


0 

C 


V 


Sz  =  ||A_1te||,  6c  =  \\6CA~1\\ 
6z,6c<<f>(p )  \\C\\p  +  0(p2) 


where  A  is  defined  in  (4-21)  and  <f)(p)  is  a  modestly  sized  function  of  p. 

Theorem  4.1  is  proven  in  Appendix  A  of  [13].  This  theorem  is  somewhat  similar  to  error 
bounds  on  orthogonal  factorization  of  matrices  with  disproportionate  rows  [8,  12]. 

The  following  corollary  gives  the  error  bounds  that  we  get  if  we  use  no  structure 
of  the  problem  (4.2)  or  the  resulting  matrix  (4.4). 

Corollary  4.1. 

<<£c(p)  \\C\\p  +  0(p2) 


(4.22) 
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where  <^(p)  is  a  modestly  growing  function  of  p. 

In  the  next  section,  we  discuss  the  effect  of  these  rounding  errors  on  the  singular 
vectors  of  the  matrix  C . 

4.3.2  Effect  of  Rounding  Errors  on  Singular  Vectors 

A  common  reason  for  computing  the  ULVD  is  to  separate  subspaces  associated 
with  large  and  small  singular  values.  For  the  ULVD,  after  downdating  ,  there  will  be  / 
large  singular  values  where  /  =  k  or  /  =  k  —  1.  If  W  —  (W  W _),  W  £  7ZnX\  W  £ 

i  Z  1  Z 

fixifi _ /)  — 

7Z  v  Ms  the  matrix  of  right  singular  vectors  of  C,  then  the  computed  range(Wr1) 
and  range(lV2)  should  be  as  close  as  possible  to  the  expected  ranges.  In  this  section, 
we  show  how  reliable  we  can  expect  these  subspaces  to  be.  Our  bounds  are  somewhat 
better  if  l  =  that  is,  if  the  downdate  does  not  alter  the  rank. 

We  need  to  measure  the  effect  of  both  6^  and  6^,  on  the  invariant  subspaces 
associated  with  the  l  largest  singular  vectors  and  the  n  —  l  smallest  ones.  We  define  C 
and  C  as  matrices  such  that 


CTC  =  VTCTCV  -  zf 


CTC  =  VTCTCV  -  (z  +  6z)(z  +  6zf. 


(4.23) 

(4.24) 


Defining  C  as  in  (4.19)  implies  that 


5  =  C  +  6C. 


(4.25) 


It  is  also  necessary  to  define  u  by 


u  =  max{l,||[X^]  Tx||,||I  Tx||). 


(4.26) 
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Note  that  the  definition  of  u  involves  only  the  L  block  and  is  related  to  the  condition 
number  given  by  Pan  [86]. 

First,  we  need  the  following  three  technical  lemmata. 

LEMMA  4.1.  The  vector  h  and  scalar  p  from  Algorithm  4-1  satisfy 

<  INI.  (4.27) 


Proof.  We  note  that 

/  \ 

P 

\h) 

which  completes  the  proof.  □ 

LEMMA  4.2.  Let  z  be  as  in  (4-7)  resulting  from  Algorithm  4-1 •  Then 

llfll<IINI(l  +  «). 


( V  0  )D 


G 


<  INI, 


(p\ 


Proof.  From  Algorithm  4.1  and  (4.20)  we  have  that 


\ 

\ 

X 

X 

2  = 

V 

=  Vl  •  ‘n-p>  '  W 

y 

U  ) 

By  the  orthogonality  of  V  we  have 

O 


iii/ii  =  m, 


where 


/  \ 

x 

\pj 


(  \ 
X 


\p ) 


Now  consider  the  least  squares  problem: 


mm 

CL^zlZ, 


l  r,(2  ),T  \  /  \ 


[L^Y 


~hT 


a  — 


X 

\p ) 


Since 


<  lt\ 

i Ttt 

'  [i<2>]T  'I 

K  0  ) 

=  V2 

rT  \ 

h  y 

we  have 


\p\  =  min 
aenk 


( [i(2)]T 


a  — 


h 1 


l  \ 

x 


\P  ) 


^  (2)  —T 

Let  a  —  [Zv  ']  x .  Then  from  Lemma  4.1 


\p\  < 


[L 


(2)i  T 


hT 


a  - 


(  \ 


\p  i 


rT~ 


=  |  ha  — 


<  \p\  +  \hTa\  <  ||G||  +  ||G||  |||I(2)]-Tx 


<  l|G||(l  +  w), 


which  completes  the  proof.  □ 


It  is  necessary  to  bound  the  effect  of  A  and  z  on  the  singular  vectors  of  C. 


Lemma  4.3.  Let  w i  =  1,2,. .  ,,n,  be  the  right  singular  vectors  of  C.  Let  A  be  defined 
by  (4-21),  let  z  be  as  in  Lemma  4-2,  and  let  u  be  given  by  (4-26).  Then 


1  r— 1. 


[| Aw  ||  <  \\L  X\\M  +  \\<T<U  \\  K  +  lie'll),  *  =  1, 2, . .  .,p 


\z  w.|  <  u  (a.  +  2  ||G||),  i  =  1,2, . .  .,p. 


Proof.  We  note  that  for  i  =  1,2 , . . . ,  p, 


Itvis  obvious  from  the  form  of  C,  that  the  last  n  —  p  components  of  all  its  nonzero  singular 
vectors  will  be  zero.  It  should  also  be  noted  that  w.  =  e.,  i  =  p+  1, . . . , n  form  a  singular 
subspace  for  the  last  n  —  p  zero  singular  values.  For  i  =  1, 2, . .  .,p,  we  have  that 


||A«,.||2  =  ||TO<,)||2  +  ||G||2||I-1||2||u,f)||2. 


Taking  square  roots  establishes  the  first  bound  on  ||Au>.||.  Continuing  we  have 


<  p-'llVf+IN2) 

<  lli_1ll2(<’i  +  IIG|l)2- 
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Taking  square  roots  again  establishes  the  second  bound  on  ||Att^.||. 
Now 


\zTw.\  <  \xTw^\  +  \yTw^\ 

<  \xTL~lLw{^\  +  \\y\\ 

<  \\rTx\\  a.  +  \\y\\ 

<  «(a.  +  ||G||)  +  ||G|| 

<  w  (a.  +  2  ||G||), 

which  completes  the  proof.  0 

Now  we  give  a  perturbation  bound  on  the  effects  of  the  backward  errors  ||£z||  and 

Ml- 

Lemma  4.4.  Assume  the  terminology  of  Lemma  4-3.  Let  w.,  w.,  i  =  1,2  ,  ...,n  be 
the  right  singular  vectors  of  C ,  Cf  and  C  in  (4-19),  (4-%3)>  and  (4-%4)  respectively.  Let 
a  >  *  *  *  >  v n  be  the  singular  values  of  C ,  and  let  >  •  •  •  >  a ^  be  the  singular  values 
of  C.  Ifa.>a.  and  o.  >  a .  we  have 


i  ~T  ^ 
\w .  w . 

2  J 


I  <  «  INI 


+  OCII^II2) 


(4.28) 


■  T~  |  . 
\w.  w  A  < 

1  i  j'  ~ 


\m 


+  0(\\sc\ I2). 


a .  —  cr . 
*  J 


(4.29) 
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Proof.  Equation  (4.29)  is  just  a  standard  error  bound  based  upon  the  perturbation 
-T  - 

of  the  eigenvectors  of  C  C.  From  the  Kato  [69]  expansion  for  eigenvectors  we  obtain 


\~T~  i 
\w.  w .  = 

1  2  / 


< 


w'^(z(Sz)^  +  (Sz)z^  +  (6z)(6z)T)w  . 
1 _ 3 


~2  ~2 
a.  —  a . 

*  3 

ll^ll  +  |2T® ,1) 

- Zr~T2 - -  +  <WI  )• 

cr.  —  a . 

*  3 


+  o(\M2) 


r-l|2\ 


From  Lemma  4.3  we  have 


a  +<t.  +  4  ||G|| 

|5f«i  I  <u,  H^II-1— f— j - +  0(||«z||2). 


cr.  —  <x. 

i  j 


An  algebraic  simplification  leads  to  (4.28).  □ 

Simply  using  the  definition  of  the  ||  •  ||^  norm  leads  to  the  following  proposition. 
It  tells  us  how  good  our  subspaces  will  be  if  we  only  have  a  bound  of  the  form  (4.22). 


Proposition  4.3.  Assume  the  terminology  of  Lemma  4-3.  Let  W  =  ( W ^  VF^),  £ 

TlnX\  £  y^nx(n  0  matrix  0j  right  singular  vectors  of  C  and  let  W  = 

( W j  Wy)  be  the  corresponding  matrix  of  right  singular  vectors  of  C.  If  a ^ 
then 


llW'fw'jllp  <  'Jl(n-l) 


«  ll«*ll  +  ll«?||  4«||G||  ||<f|| 

n  _  n  r  2  2 

*  1+1  ai~ai+ 1 


+  o 


\‘e  I 


2\ 


Proposition  4.3  applies  even  if  the  downdate  changes  the  rank.  The  condition 
number  u  in  (4.26)  depends  solely  upon  the  L-block,  the  matrix  C  does  not  have  to  be 
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well-conditioned  or  even  full  rank  for  the  downdating  problem  to  be  well-conditioned  as 
is  required  in  previous  analyses  [86,  87],  If  there  is  no  rank  change,  that  is  if  /  =  k,  we 
can  get  an  even  better  bound  as  shown  below. 

We  define  e  and  E  according  to 

e  =  A~i6z/6z,  E  =  6CA~*/6C,  (4.30) 

where  6^  and  6^  are  the  bounds  from  Theorem  4.1.  Note  that  ||e||  =  ||f?||  =  1.  First,  a 
technical  lemma  characterizes  the  vector  z  that  is  downdated. 

Lemma  4.5.  Assume  the  results  and  terminology  from  Theorem  4-1 ■  Let  w.,  i  =  1, 2, . . n 
be  the  right  singular  vectors  of  C  and  let  w.  ,  i  =  1,2 , . . n  be  the  right  singular  vectors 
of  C.  Let  a  >  >  . . .  >  a  be  the  singular  values  of  C.  Then  for  all  i  and  j  such  that 

a.  ^  a .  we  have 

i  r  j 

6 

<  Z2  ^ ~2 (llA^yll  +  llA™;ll  \zT™j\)  +  °(&2Z)-  (4-31) 

*  j 


Proof.  From  Kato[69],  we  have  that 


~T  ~ 
w .  w . 

j  j 


6 

z 


_  T  '  T 

w .  A (ze  +  ez  )Aw. 
j  i 


+  o(s2z) 


where  e  is  defined  in  (4.30).  Taking  norms  we  obtain  (4.31).  □ 

Lemma  4.6.  Assume  the  results  and  terminology  from  Theorem  4-1-  Let  w.,  i  =  1,2, . .  .,n 
be  the  right  singular  vectors  of  C  and  let  ,  i  =  1, 2, . . . ,  n  be  the  right  singular  vectors 
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of  C  —  C  +  6C  as  in  (4-25).  Let  >  . . .  >  a  be  the  singular  values  of  C .  Then 

for  all  i  and  j  such  that  a .  >  o .  we  have 

i  3 


I  w  .1  <  2 

i  j 


sc  l|£-1||  (»•  +  IIGII) 

a.  —  a . 

i  j 


+  o(s2c). 


(4.32) 


Proof.  Again  using  the  Kato  [69]  expansion 


„  wT(C  6C  +  CT6C)w. 

=  - T— 2 - 2  + 

<7 .  —  <7  . 
t  3 


Using  the  definition  of  E  in  (4.30)  we  have 


„  wTAECw.  +  wTCTEAw. 

wTw.  =  - % - L - 1  +  0(S2). 


i  j  C 


(T.  -  <7  . 

*  3 


Using  norm  inequalities  we  have 


||A® ||  ||£||  ||C®  ||  +  ||C®  ||  ||£||  ||A«  || 

K  ».l  <  V- — ' - ^ +  O(sl). 


i  j 1  -  c 


2  2 
o.  —  o . 

i  3 


(4.33) 


Using  Lemma  4.3  and  the  fact  that  ||Cw^||  =  a  yield 


\wfw.\  <  6C 


II  l' 


(*•  +  INI)  aj  +  (Vj  +  INI)  ai 


2  2 

(J.  —  <7  . 

*  J 


+  0(6*). 


Reorganizing  leads  to  (4.32).  □ 
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Now  we  can  bound  the  quantities  in  our  lemma  on  the  singular  vectors.  These 
bounds  actually  apply  to  the  singular  vectors  of  C,  but  we  will  ignore  second  order  effects 
and  use  these  bounds  for  the  singular  vectors  of  C. 

Now  we  need  a  bound  on  the  errors  in  the  singular  vectors  of  C  from  6  .  To 
within  rounding  error,  the  same  bounds  as  in  Lemma  4.3  will  hold  for  w.,  i  —  1,2, . .  ,,n. 
That  is,  \\zTw.\\  <  u  {a.  +  2  ||G||)  and  ||A«y|  <  ||X_1||  (e.  +  ||G||). 

Lemma  4.7.  Assume  the  terminology  of  Lemma  4*3  and  the  results  and  terminology  of 

Theorem  A-l-  Assume  that  o.  >  o ..  Then 

*  3 

^  H2“i  it  " 

<  2  +  1.5  ||G||  +  2  WG^Kc.  +  5.)]  +  0(«2). 

*  3 

Proof  Combining  Lemmata  4.3  and  4.5  yields 

'r  6  u)  ||Z  ||  r 

*  J  -  ~2_-2  |fi  +  2  INIX^- +  llgll) 

*  3 

+  (*■  +  2  IIGIIKS.  +  IIGII)]  +  0(62z) 

6  u  ||i_1||  _  _  _  2  2 

-  2  - — (o  o  +  1.5  ||G||(5  +  o  )  +  2  ||G||  )  +  0(6  ) 

o,-o.  J  J 

i  3 

6  u)  ||  T  || 

<  2  .  \s  +  1.5  lie'll  +  2  ||G||2/(5  +  5  )  +  G(«2), 

0,-0.  J  1  J  z 

l  J 

which  completes  the  proof.  □ 

The  proof  of  the  following  theorem  is  obvious  from  Lemma  4.7. 
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THEOREM  4.2.  Assume  the  hypothesis  and  terminology  of  Lemma  4- 7.  Let  W  be  the 
matrix  of  singular  vectors  of  C  and  let  W  be  the  matrix  of  singular  vectors  of  C .  If 


W  =  (Wl  W2),  W  =  (Wl  W2), 


where  W,  €  Unxk ,  and  W2,W2  6  Knx(n  ^  then 


~T~  / - 6  «  ||X  || 

\\wfw  \\  <  2  \Jk{n  -  k)  f—= - X 

k  k+ 1 

>*+, + 15  iigii + 2  ii^ii  X + ?t+1>l + 


Analogously,  we  can  bound  the  effect  of  8^. 

Theorem  4.3.  Assume  the  hypothesis  and  terminology  of  Lemma  J[.6.  Let  W  be  the 
matrix  of  singular  vectors  of  C  —  C  +  8C  and  let  W  be  the  matrix  of  singular  vectors  of 
C.  If 

W  =  (W1  w2),  W  =  (Wl  w2), 
where  W1,Wie'Rnxk,  and  W  ,W  €  Tinx(n~k)  then 


T —  , -  U|£  *11  K+l  +  lie'll)  ,  2  , 

\\^W2\\F  <  2  yjHn-k)  - - - +  0(S2C). 


The  effect  of  8n  is,  in  fact,  somewhat  less  critical  than  that  of  8  as  has  been 
stated  in  other  analyses  of  this  problem  [86,  87,  99].  We  note  that  the  error  bounds  in 
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Theorems  4.2  and  4.3  are  relative  gap  bounds  on  the  error  in  the  subspaces  similar  to 
those  in  [11,  36], 

If  l|X_1||  ||<7||  <  1,  these  bounds  are  a  significant  improvement  over  those  in 
Proposition  4.3.  This  is  one  of  the  reasons  for  maintaining  the  property  (4.4). 

4.4  Numerical  Examples 

In  this  section,  we  present  a  few  examples  from  numerical  experiments.  These 

tests  were  performed  using  Matlab  on  a  SPARCstation  5  in  IEEE  Standard  double 

__  1 6 

precision  with  machine  precision  «  10  .  The  algorithm  employed  the  sliding  window 

technique  described  in  Section  2.7.3. 

At  each  step  of  the  sliding  window  method  with  the  window  size  mQ,  an  x  n 

data  matrix  is  constructed  from  an  m  x  n  observation  matrix  A  by  adding  a  new  row  to 

the  data  matrix  in  the  previous  window  and  deleting  the  oldest  row  from  it.  In  step  j, 

the  row  m  +  j  of  the  observation  matrix  is  added  and  the  row  j  is  deleted,  giving  the 

data  matrix  A .. 

J 

Computing  the  ULVD  of  initial  data  matrix  is  described  in  Section  2.5.2.  Then 
Algorithm  4.1  takes  the  lower  triangular  matrix  (middle  part  of  the  decomposition), 
the  orthogonal  matrix  (right  part)  as  initial  input  and  the  modifying  vector  r,  and 
successively  modifies  these  matrices  at  every  window  step.  The  vector  z  —  V  r  is 
computed  at  the  beginning  of  each  window  step. 

We  tested  our  algorithms  in  the  context  of  the  total  least  squares  (TLS)  problems. 
See  Section  2.7  for  details.  We  used  the  TLS  solutions  from  the  Jacobi  SVD  as  reference 
in  checking  the  accuracy  of  the  solution  and  rank  estimates  of  our  algorithms. 
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Fig.  4. 3-4.5  show  the  rank  estimates  by  our  algorithm,  which  are  identical  with 
those  of  the  Jacobi  SVD  algorithm.  The  horizontal  axis  represents  the  window  steps  and 
the  vertical  axis  the  numerical  rank  of  the  window  matrix. 

The  distance  between  the  subspaces  is  given  in  the  next  plot  using  the  Definition 
2.3.  Let 


A. 

J 


YX.WT, 

JJJ 


w.  =  (Wn  wj2) 


be  the  SVD  of  A  .  computed  by  the  one-sided  Jacobi  method  at  step  j .  Let 


A . 
J 


T 

u.c.v 1  , 

JJJ 


VJ  =  <b'i  V 


be  the  ULVD  of  A.  computed  by  Algorithm  4.1.  Note  that  here  we  are  discarding  V 
Finally,  let 

C.  =  ?.X.WT ,  W.  =  (W.,  kF.J 

J  JJJ  J  J 1  J^ 

by  the  SVD  of  C .  computed  by  the  one-sided  Jacobi  method.  Define  W.  by 


W.  =  (Wt,  W.n)  =  V.W.. 

J  J 1  ;2'  j  j 


Define  the  angles  between  the  subspaces 

sin(#2)  =  l|W^V.2|l,  »in(*3)  =  IIW^II.  (4.34) 

The  angles  6.,  i  =  1,2,3  represent,  respectively,  error  between  the  true  noise  sub¬ 
space  from  the  Jacobi  SVD  and  the  approximate  one  from  tracking  the  ULVD,  the 
approximation  error  from  the  ULVD,  and  the  subspace  errors  from  ULVD  subspace 
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tracking.  The  value  sin(^)  is  one  that  is  bounded  by  our  error  analysis.  We  plot¬ 
ted  logjg(sin(0p),i  =  1,2  in  solid  and  dotted  lines,  respectively,  in  the  vertical  axis  of 
the  second  graph.  It  turned  out  that  log1Q(sin(03))  was  almost  indistinguishable  from 
log  (sin(0j)),  so  we  did  not  plot  it.  sin(<?2)  is  the  approximation  error  discussed  by 
Fierro  and  Hansen  [43]. 

Finally,  the  TLS  errors 

IJxfVD)  _  .(ULVDIh 

r'=  5  llf™,ll 

are  given  in  logarithm  in  the  last  plot.  Here,  anc[  a^^VD),  arg  so}u^jons 

using  the  SVD  and  the  ULVD,  respectively. 

For  our  condition  estimators,  we  use  the  LINPACK  condition  estimator  to  approx¬ 
imate  the  left  singular  vector  that  corresponds  to  the  smallest  singular  value,  followed  by 
inverse  iteration  using  this  approximate  singular  vector  as  the  initial  vector.  The  tests 
show  that  up  to  three  steps  of  inverse  iterations  suffice  the  accuracy  of  the  approximate 
smallest  singular  value  required  by  the  algorithm. 

EXAMPLE  4.1.  A,  a  110-by-6  random  matrix,  b,  a  110-by-l  random  vector.  Entries  of  A 

and  b  were  chosen  from  a  uniform  distribution  on  the  interval  (0, 1).  85  randomly  chosen 

rows  of  ( A ;  b )  were  multiplied  by  7  =  10  ^  in  order  to  vary  the  rank  of  the  matrix,  and 
_2 

tol  =  10  .  The  window  size  p  used  was  12. 

_ g  _ 0 

Example  4.2.  Same  as  Example  4.1  except  that  7  =  10  and  tol  =  10 

4 

Example  4.3.  Same  as  Example  4.1  except  that  the  matrix  had  an  outlier  of  size  10 
at  (18, 1)  position. 
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The  first  plot  shows  that  our  algorithm  estimated  the  numerical  ranks  correctly 
throughout  the  sliding  window  steps  in  spite  of  frequent  rank  changes.  Although  the 
errors  in  tiny  singular  values  were  relatively  large,  and  the  small  singular  values  were 
almost  always  overestimated,  they  were  close  enough  to  correctly  estimate  the  rank.  Thus 
the  rank  estimates  from  our  algorithm  and  those  by  the  Jacobi  SVD  were  identical.  As 
expected,  the  errors  in  the  TLS  solution  r.  are  almost  exactly  the  same  as  the  size  of 
sin(0j)  in  (4.34). 

The  second  plot  in  each  figure  shows  that  the  noise  subspace  error  is  very  small 
giving  accurate  TLS  solutions.  The  quantities  in  (4.34)  are  shown  to  be  essentially 
identical  indicating  that  the  subspace  errors  from  our  algorithm  are  from  the  rounding 
errors,  not  approximation  errors.  We  note  that  the  Example  4.1  has  greater  error  in  the 
noise  subspace  than  Example  4.2. 

This  is  probably  because  Example  4.1  has  only  a  small  relative  gap  in  the  spectrum 

—5  _ o 

around  7  =  10  but  a  large  relative  gap  around  7  =  10  .  However,  for  all  of  our 

examples,  the  approximation  to  the  subspaces  by  the  ULVD  is  very  good.  Since  the  error 

bounds  on  the  distance  between  the  noise  subspaces  depend  on  the  ( k  +  l)-st  singular 

value,  the  approximated  singular  subspace  gets  better  as  7  decreases  as  shown  in  Fig. 

4.3-4. 4. 

The  TLS  solution  errors  behave  very  much  the  same  as  the  noise  subspace  errors. 
From  (2.18)-(2.19)  it  is  not  difficult  to  see  that  the  TLS  errors  differ  from  the  noise 
subspace  errors  only  by  a  constant  factor. 

Moreover,  the  algorithm  performs  well  even  when  G  becomes  singular  (indicated 
by  in  the  first  plot).  We  tested  several  other  examples,  and  these  results  were  typical. 

Since  our  downdating  procedures  use  the  LINPACK  downdating  algorithm,  it  is 
not  difficult  to  generate  the  cases  where  the  algorithm  breaks  down  when  ||a||  >  1,  for 
instance,  when  deleting  a  row  that  contains  outliers.  In  this  case,  we  first  refine  the 
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decomposition  [100,  104],  that  is,  compute  an  orthogonal  matrix  U  such  that 


\ 

~  ~  \ 

L  0  ' 

L  H 

u 

= 

V  F  GJ 

d  ) 

(4.35) 


~T 

If  it  is  still  true  that  ||a||  >  1  even  after  the  refinement,  where  L  a  =  x,  the 
corrected  semi-normal  equation  (CSNE)  approach  [18]  (indicated  by  in  the  first 
plot)  is  used  for  computing  a  with  higher  accuracy.  It  is  essentially  the  same  as  that 
used  by  Park  and  Elden  [87]  and  is  given  by 


Ly  =  a ,  t  =  el  -  X.V^y 

l7 6a  =  vT'xTt,  a  =  a  +  6a 
i  3 

L6y  =  6a,  t  =  t  —  X  V^6y,  a  =  ||f|| 


where  xj  —  (r  ,  the  j-th  window  matrix  augmented  with  the  row  being  deleted. 
Finally,  restore  the  lower  triangular  form,  that  is,  compute  an  orthogonal  V  such  that 


(4.36) 


The  CSNE  approach  was  used  in  all  three  examples  and  most  extensively  in 

Example  4.3  when  downdating  a  row  with  an  outlier.  However,  the  performance  of 

our  algorithm  was  less  satisfactory  for  larger  outliers.  This  is  consistent  with  our  error 

analysis,  since  for  a  very  large  outlier  we  would  have  u  in  (4.26)  very  large. 

2 

The  refinement  steps  in  (4.35)-(4.36)  require  k  (n  -  k)  Givens  rotations,  so  that 

l 

they  become  impractical  when  k  =  O(n^)  or  larger.  As  an  alternative,  we  solve  for  a 
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from  the  equation 


LTFT 


a  —  z 


by  solving 


rT 

L  a  =  z 

D  TCI 


mm 


N 


{  \ 


aN  + 


(  \ 

aB 

\  °  / 


Then,  we  see  that 


a  = 


l  ,-TrT  \ 
aB  L  F  aN 


(4.37) 


(4.38) 

(4.39) 


(4.40) 


2 

is  the  minimum  length  solution  to  (4.37).  (4.38)  requires  0(k  )  back  substitution, 
and  (4.39)  can  be  approximated  by  a  few  steps  of  Lanczos  algorithm  since  it  is  well- 
conditioned. 


Table  4.1.  Tracking  ||jF||  and  ||G||  for  the  ULVD  Procedure 

F  1 F 


Steps 

Updating  Formula 

Flops 

2 

Ilf'X  =  IIF'X  -  ll/!' V 

O(k) 

0(1) 

5 

l|f,3)llp  =  l|Fwl£  +  llsf’ll2  -  Ill’ll2 
l|G(,)ll2F  =  l|G(3)||2F  -  Ikf’ll2  +  llshl2 

0(p  —  k ) 
0(1) 
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Moreover,  to  prevent  a  from  becoming  too  large,  we  track  ||.F||^,  so  that  it  remains 
under  certain  threshold,  say,  ||T-1||  ||F||^,  <  0.01.  This  is  similar  to  recommendations 
made  by  Fierro  and  Bunch  [41,  42].  Only  steps  in  Algorithm  4.1  that  require  to  update 
||ip||^,  are  Steps  2  and  5.  Table  4.1  shows  how  to  update  these  quantities.  Here,  <7^ 

Cj\ 

denotes  the  first  column  of  G  ,  i  =  1,3,4. 


Rank  Estimates 


TT 


**  ****  ***** 


Noise  Space  Errors 


40  50  60  70 

TLS  Solution  Errors 


J _ I _ i - L 

40  50  60  70 


Fig.  4.3.  Example  4.1 


Noise  Space  Errors 


Fig.  4.5.  Example  4.3 
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Chapter  5 

Rank  Detection  for  Modifying  the  ULV  Decomposition 

5.1  Introduction 

In  Algorithm  4.1  the  deflation  steps  are  required  to  compute  the  numerical  rank 
after  a  downdate.  The  deflation  step  usually  involves  some  condition  estimator,  which 
can  be  a  nuisance  in  some  situations  such  as  in  parallel  implementation.  A  survey  of 
popular  condition  estimators  is  given  in  [61],  and  they  all  require  0(k  )  flops,  so  that 
the  entire  process  requires  0(k  )  flops,  where  k  is  the  dimension  of  L  in  (4.2).  With 
some  modification  to  Algorithm  4.1  we  can  often  eliminate  Step  7,  the  deflation  step. 

Furthermore,  the  modified  algorithm  offers  an  efficient  way  of  tracking  exact  quan¬ 
tities  of  || L  1 1|^,  ||.F||^,  and  ||G||^,  which  give  a  significant  information  on  the  condition 
of  the  downdating  problem.  The  experiments  show  that  the  computed  subspaces  are  as 
good  as  can  be  expected,  and  no  worse  than  those  demonstrated  in  the  previous  chapter. 
The  updating  algorithm  for  the  ULVD  can  be  also  implemented  similarly  with  a  slight 
modification  to  the  downdating  algorithm. 

We  propose  our  new  ULVD  updating/downdating  algorithm  in  Section  5.2.  Sec¬ 
tion  5.3  contains  the  rank  detection  method  related  to  the  new  algorithm.  In  Section 
5.4,  we  give  numerical  tests  of  our  algorithm  on  the  RTLS  problems. 
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5.2  New  Algorithm  for  ULYD  Downdating 

Our  new  algorithm  reduces  the  downdating  problem  to  a  2  x  2  downdating  prob- 

T  T 

lem.  As  in  Algorithm  4.1,  we  assume  that  L  L  -  xx  is  positive  definite,  and  U  is  not 
accumulated. 

Algorithm  5.1  (New  Procedure  for  ULVD  downdating).  Assume  the  terminol¬ 
ogy  of  Algorithm  4.1,  and  denote  also  that  lkl  ~ 

Step  1.  Construct  orthogonal  matrices  Uy^E  Hkxk  and  E  n^p~k)x(^p~k) 

such  that 


;(1>  =  ufivv 

H 

II 

II 

H 

^  H 

(5.1) 

<’>  =  v?avr 

-  T 

v2y  =  Pe^  p  =  IMI- 

(5.2) 

Also,  compute 

F(1)  =  U^FVV 

Update  V  «-  V  d\zg(VvIn_k)  diag (/*,  V»-p>' 


Step 


2.  Find  an  orthogonal  matrix  U~  E  Tz(k+^*(k+^  such  that 


J2)  h 

0  if?, 


=  u: 


a) 


0 


i/r>T  ) 


(5.3) 


Define  =  (/  —  )Fy‘! . 


T,  r0) 
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Step  3.  Write 


r(2)_ 


fc-1  1 

‘  Lll  0 

,  (2),T  ,(2) 

<”  >  ‘kk 


k- 1 


kxk 

and  find  an  orthogonal  matrix  U^£lZ  such  that 


(3)_ 


( Lf)  „P)  \ 

0  ‘kk  / 


4 


(5.4) 


(1)  _  j1 

Compute  also  h}  '  =  h. 


Step  4.  Use  Algorithm  3.9  for  2  X  2  downdating: 


where  we  solve 


and  set 


JfcJfc 
o  5) 


(1)  ^ 
Ar 

r 

(*,) 

ft) 

,(2) 
ii  y 

V  ^2  J 

U; 

=  /?j,  P2  =  min{^2,  ^1-/^}. 


Step  5.  Find  an  orthogonal  matrix  V.  G  72.  such  that 


(5.5) 


(5.6) 


i(5)  = 


^  0  ^ 
/  (5),T  ,(5) 


=  z<% 


(5.7) 
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where  L 


(4) 


and 


differ  only  on  the  ( k ,  k )  entry.  Also,  compute 


f(3)  _  F(2)V3. 


Update  V  <-Vd\zg(V3,In_k). 


Step  6.  Find  an  orthogonal  matrix  V 4  £  7j(*+l)x(k+l) 


such  that 


(i(6)  0 )  =  (I(5)  /i(2)e^)U4  (5.8) 

(^(4)  <?(4))  =  (i^  (5.9) 

Update  V  «-  V diag(U4, If  5[4)  #  0,  then 


and  skip  Step  7. 


Step  7.  If  <7^  =  0  then  =  0  also  since  it  was  formed  from  using  V^.  Thus 


k  P~k 


k  P~k 

(  r  \ 

k 

f>>  0  ^ 

1 

L  0 

II 

0  0 

p—k 

{FW  G(4)  j 

1 

,F  G(4)J 
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~(4) 

where  GK  ’  is  a  lower  Hessenberg  matrix.  We  then  find  an  orthogonal  matrix 

V  6  TZ^P  k)*(p  k)  an  or^}j0g0na]  permutation  matrix  U  6  7 ^(P~^)x(P~k) 
o  5 

such  that 


Update  V  «-  V  diag(7, ,  V,,/  ).  Thus, 

tc  u  n — p 


k 

p—k— 1 
n-p+1 

Step  8.  Update  the  numerical  rank  of  Z,  and  make  appropriate  adjustments  to  F 
and  G  (a  procedure  for  the  rank  detection  will  be  described  in  the  next  section). 

Algorithm  5.1  requires  12A;2  +  6(p  -  A;)2  +  24fc(p  —  k)  +  0(p )  flops  for  Steps  1-6  and 
4fc  +  ^(A:)  flops  for  Step  7.  When  V  is  modified,  additional  6 np  +  12 nk  flops  will  be 
required.  Fig.  5. 1-5.2  depict  the  reduction  steps  for  Algorithm  5.1.  As  in  Algorithm 
4.1,  a  set  of  — ►  denotes  the  downdating  Step  4,  and  Step  7  is  illustrated  in  Fig.  4.2. 

Remark  5.1.  Note  that  in  (5.5)-(5.6)  we  incorporate  “additive  noise”  to  compute  a  as 
similarly  done  in  Step  4  of  Algorithm  4.1.  As  shown  in  Chapter  4,  the  algorithm  can  be 
made  consistent  with  this  additive  noise  in  the  absence  of  rounding  error.  Proposition 
4.2  shows  this  important  consistency  property. 

Remark  5.2.  Algorithm  5.1  has  a  few  advantages  over  Algorithm  4.1.  Algorithm  5.1 
does  not  incorporate  the  deflation  process  to  estimate  the  numerical  rank  as  often  as 
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Fig.  5.1.  Improved  Reduction  Steps  1-3  for  Downdating  the  ULVD 
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Fig.  5.2.  Improved  Reduction  Steps  4-6  for  Downdating  the  ULVD 
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Algorithm  4.1.  It  turns  out  that  after  a  downdate,  Z^,  the  (k,  k)  entry  of  X  often  gives 
a  very  good  approximation  to  amjn(X)-  Steps  3  and  5  offer  this  benefit  with  some  extra 
work,  namely,  2  (k  —  1)  Givens  rotations. 

Remark  5.3.  The  algorithm  can  be  also  used  for  updating  the  ULVD  just  by  replacing 
Step  4  with  the  Algorithm  3.8,  which  requires  only  12  flops. 


5.3  Rank  Detection 
5.3.1  Bounding  || X  *|| 

In  this  section,  we  show  that  in  Step  8  we  can  often  determine  the  numerical  rank 
of  A  without  the  deflation  process.  We  start  with  stating  a  simple  lemma  that  bounds 
the  2-norm  of  a  block  matrix  in  terms  of  2-norms  of  its  blocks. 

Lemma  5.1  ([9,  Lemma  3.3]).  Let  M  and  M  be  the  sx  s  block  matrices, 


l|Mn||  ||M12||  ||MJ| 

M  = 

ii" J  -  II" J 


Then  ||M||  <  ||M||. 

Proof.  See  the  proof  in  [9].  □ 

In  fact,  Lemma  5.1  holds  for  any  consistent  norms.  A  straightforward  application  of  this 
lemma  results  in  the  following  lemma. 
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Lemma  5.2.  Let 


v  1 


L  = 


Ln  w 

0  7 


(5.10) 


where  L ^  is  nonsingular  and  7  ^  0.  Then , 


pr'll< 


iiir'n  7_1ih7,ihi 


/ 

11 

V  0 


11 

-1 


(5.11) 


Proof.  It  is  easy  to  verify  that 


L-'  = 


V1 

Ln 
o 


-l  r-i 
-7  Ln  w 

-1 


(5.12) 


Then  taking  norms  and  using  Lemma  5.1  yield  (5.11).  □ 

(i) 

The  following  lemma  shows  the  effect  of  Steps  3  and  5  of  Algorithm  5.1  on  Zj^, 
(k,k)  entry  of  L^\  i  =  2,4. 

Lemma  5.3.  Let  L  be  defined  in  (5.10).  Suppose 


7 


/ 


(5.13) 


where  Q  is  orthogonal.  Then, 


~-l 

7 


7-'(i  +  iii-'»iij)1/2. 


(5.14) 
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we  have 

2fc\  l(  2  i  j,2  .  2^  .  1  //  2  .  ,2  .  2v2  "  2  2 

=  -(a  +6  +  c  )  +  -y  (a  +o  +  c  )  -  4a  c  .  (5.17) 

Let  a  =  HI"1 1|,  b  =  — 7~’1||L~1u>||,  c  =  7  *.  Then  by  Lemma  5.3,  we  see  that  62  +  c2  = 
7  ,  and  by  Lemma  5.2,  ||X  ||  <  cr^(5).  Hence,  taking  a  square  root  in  (5.17)  proves 

the  lemma.  □ 

The  following  lemma  states  necessary  conditions  to  ensure  a  correct  rank  estima¬ 
tion  for  the  ULVD. 

Lemma  5.5.  Let  C  be  defined  in  (4-2),  and  let  77  =  \\FL  *||.  Suppose 

\\L-l\\<tol-\  ||G||  (1  +  rj)  <  tol,  7/<l.  (5.18) 

Then , 

ok(C)  >  tol  >  <rk+1(C).  (5.19) 


Proof.  By  singular  value  interlacing  property  [63,  Theorem  7.3.9],  it  is  easy  to 
show  that 

7(C)><Tmin(i)  =  ||i-1r1,  7+1(C)  <  ^(G)  =  ||G||.  (5.20) 

Thus, 

ak{C)  >  tol  >  tol  ■  yy^  >  ||G||  >  *k+1(C). 

This  proves  the  lemma.  □ 

Thus,  the  decomposition  always  remains  rank-revealing  as  long  as  the  conditions 
in  (5.18)  are  satisfied.  Note  that  these  conditions  are  not  enforceable  unless  there  is  a 
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reasonable  gap  in  the  spectrum,  but  the  conditions  for  its  existence  are  weaker  than  the 
RRQR  conditions  of  Hong  and  Pan  [62]. 

A  cluster  of  singular  values  around  tol  would  cause  one  of  the  conditions  in  (5.18) 
to  be  violated,  and  the  rank  to  be  underestimated.  Since  we  keep  tracking  ||.F||^,  and 
||G||^_,  automatically  as  a  part  of  the  algorithm  as  illustrated  in  Section  5.3.3,  we  will 
always  know  when  this  happens. 

The  quantities  in  the  above  lemma  cannot  be  tracked  efficiently  using  our  algo¬ 
rithm.  However,  some  good  bounds  can  be  tracked.  That  leads  to  the  following  theorem 
that  uses  only  computed  quantities. 

Theorem  5.1.  Consider  the  downdating  procedure  in  Algorithm  5.1  and  the  related  up¬ 
dating  procedure  given  by  Remark  3.5.  Let  C  be  the  matrix  before  updating  (downdating) 
and  partitioned  as  in  (4-3) ,  and  let  L  have  the  condition  estimate  k  «  || L  ^||  *  such 
that 

ok(C)  >K>tol>  ok+l(C). 

Let  C  be  the  matrix  after  updating  (downdating)  and  partitioned  according  to 


C  = 


s  p-s  n-p 

^  L  0  0  ^ 

F  G  0 

y  0  0  0 


P~s  , 

n-p 


(5.21) 


where  s  =  k  +  1  for  updating  and  s  =  k  for  downdating.  Define 


r  1=^k  !’iy  !>u  !) 


(5.22) 


where  ij)(x,y,z)  is  the  function  defined  in  (5.16). 

If 
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(i)  K>tol 

(ii)  \\F\\F/K  =  fj<l 

(in)  ||G||f  <  tol(l  +  fj)~l 

then 

<Js(C)>tol><rs+1(C). 


If 


(iv)  K  >  tol 

(v) 

(Vi)  f\\6fF  +  life/ +  irj2  <  Ml  +  Vf' 

then 

<^_1(C')  >  tol  >  og(C). 


(5.23) 


(5.24) 


Proof.  Using  the  fact  that  for  any  matrix  A,  ||  A||  <  ||-4||^,,  it  is  simple  to  show  that 
the  conditions  (i)-(iii)  satisfy  the  hypothesis  of  Lemma  5.5.  Thus,  (5.23)  immediately 
follows. 

Let 

s— 1  p-s- fl  n-p 

(  L  0  0  \  *-l 

C  =  f  G  0  P-s+1  , 


0 


0 


0 


n—p 
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where 


G 


1  P-* 


‘-X  ^ 

ss 

f~s  G 


1 

p — S 


Then,  it  is  simple  to  verify  that 


Wp  =  Wlf  +  WI2-ll/sll2 
INI2  =  l|c||2  +  ||/sll2  +  |/J2 

Again,  the  conditions  (v)-(vi)  ensure  that  ||-F||^,  and  ||<5||^,  satisfy  the  hypothesis  of 
Lemma  5.5.  Hence,  (5.24)  also  follows.  D 

The  conditions  (iii)  and  (vi)  make  certain  that  there  exists  a  reasonable  gap  in  the 
spectrum.  When  the  conditions  (i)-(iii)  are  satisfied,  updating  results  in  a  rank  increase. 
Similarly,  when  the  conditions  (iv)-(vi)  are  met,  downdating  results  in  a  rank  decrease. 
From  the  theorem  we  see  the  importance  of  keeping  ||F||  as  small  as  possible.  Several 
ways  to  do  this  are  proposed  in  [41,  42,  104]. 
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The  theorem  still  holds  even  if  k  is  replaced  by  k  a  \\L  ^ in  (5.22).  We  can 
explicitly  compute  \\L  || ^_,  in  0(n  )  flops  as  described  in  the  following  section.  In  some 

cases,  this  may  be  a  more  pessimistic  bound  than  the  one  in  Theorem  5.1. 

We  should  note  that  there  is  still  a  possibility  that  k  could  be  any  underestimate 
of  <r^(Z).  Thus,  if  the  conditions  of  this  theorem  are  not  satisfied,  then  the  condition  of 
X  will  still  have  to  be  estimated.  When  the  conditions  (iii)  or  (vi)  are  violated,  indicating 
that  there  exists  little  gap  between  them,  one  must  refactor  or  redefine  tol.  If  none  of 
these  work,  one  can  conclude  that  the  ULVD  may  not  be  suitable  for  tracking  subspaces 
for  the  problem  under  consideration. 

One  of  the  factors  that  might  affect  the  rank  estimation  using  this  scheme  is 
incorrect  rank  estimate  from  the  previous  update/downdate.  This  problem,  however, 
can  be  solved  by  computing  the  initial  factorization  using  the  SVD  for  an  accurate  rank 
estimation.  Although  the  SVD  is  more  expensive  to  compute  than  the  ULVD,  the  cost 
will  become  negligible  when  amortized  over  the  cost  at  the  subsequent  updating  and 
downdating  steps. 

Next  two  sections  will  show  how  to  track  efficiently  the  quantities,  ||X_1 1|^,,  ||.F|| 
and  ||G||f . 

5.3.2  Tracking  ||X-1||^, 

We  begin  with  the  following  lemma. 

LEMMA  5.6.  Assume  the  hypothesis  and  terminology  of  Lemma  5.3.  Then, 

I|X-1||2F  =  ||X-1||2F  +  ^ 

7 


(5.25) 


96 


Proof.  From  (5.12)  we  see  that 


II L 


-1, 


m2 


=  Un%  + 


1  +  || L  !u>| 


Then,  (5.25)  follows  directly  from  Lemma  5.3.  □ 

Because  of  orthogonal  invariance  of  ||  •  ||^,  we  observe  that 


ii{i<3)}'1nf.  =  ii{i(2)r1iif. 
ii{i(5)r,iif  =  iK^r'iif 

where  we  used  the  terminology  of  Algorithm  5.1.  Thus,  the  steps  in  Algorithm  5.1  that 
we  need  to  consider  updating  ||Z  ^  are  Steps  2,  4,  and  6.  The  following  lemmata  give 

the  formula  for  each  step. 

LEMMA  5.7.  Assume  the  terminology  of  Algorithm  5.1.  Then 


f/(3)\2  _  r,(4L2 

ii{i(4)}_1ii2F  =  iK^(3)r1iiF + ^ — {-M^~ 


i(3),(5) 

kk  kk 


(5.26) 


Proof.  Since 


iKi(3)}'iHf  = 


ll{if,)}“1ll2p.+ 

IKi(3)}'1ll2f.+ 


i+u<ip>r1„<3V 

O2 

i+ii(if,)>-1»(3)ii2 

<'2>2 


(5.27) 

(5.28) 


we  see  that 
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r(4)^_1ll2  -  ||{I(3)}-1||2 


WiiTTX 


l  +  lltfRWV 


fe2 


+ 


i + ii{i<131)}_i»(3)ii2 


('il’j2 


(5.29) 


By  Lemma  5.3  we  obtain 


i+ii{i(,31)}'I<»(3)ii2 


('it’)2 


(5.30) 


Substituting  (5.30)  into  (5.29)  yields  the  result.  □ 

Since  is  not  available  at  Step  4,  and  ||{L^}  =  ||{£^}  ^||^,,  we  update 

this  at  Step  5. 

Lemma  5.8.  Assume  the  terminology  of  Algorithm  5.1.  Then 


iK£(2)r 

'X  = 

ii{i(1)}'1n2f  - 

INI2 

(5.31) 

<s<2>>2 

ii{l(6)>- 

= 

< — 

'cn 

1 

1 

Hill2 

(5.32) 

<»l?>2 

where  L^c  =  h,  and  { 

d=f[4\ 

Proof.  Let 

!<>)-(  i0) 

o  \ 

a(1)  ’ 

^11  / 

i<2>  = 

V  0 

A  |=c fim 
(2)  3 
*11  / 

98 


Then  we  obtain 


ii{i(1)r1ii2f 

iK£(2)r‘iif 


IK£(1)>-'||^  + 
ll{i(2)>-1l[J,  + 


i + n<i(1)r771(1)if 

M1,’)2 

1  +  IK^W 

U‘?>2 


(5.33) 


(5.34) 


By  orthogonal  invariance,  we  see  that  ||{Z^}  1 1| ^  =  ||{X^^}  ^H^,,  so  that 


ii{£(2)r1ni  = 


By  Lemma  5.3,  we  see  that 


1  +  ll{i(2)}_1ftll2 


<9<2)>2 


(5.35) 


i  +  ii{i(1,}_T/](1)ii2 


1 


<s<2)>2' 


(5.36) 


Substituting  (5.36)  into  (5.35)  yields  (5.31).  The  proof  of  (5.32)  is  similar.  □ 

Computing  ||{Z^}  *||^_,  and  ||{Z^^}  ^ ||^_,  requires  0(k •?)  flops  for  forward  and 
backward  substitutions,  respectively,  and  ||{Z^}  *||^,,  0(1)  flops. 

Finally,  when  there  is  a  rank  change  in  Step  8,  we  use  the  relation, 


i  +  ll^V 
««>2 
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where 


^ kk  / 


(5.37) 


2 

Thus,  it  takes  0(k  )  back  substitution. 

All  of  these  formulae  also  work  for  updating  as  well  as  downdating  for  obvious 
reasons.  Updating  and  downdating  procedures  differ  only  on  Step  4  when  updating  and 
downdating  2x2  matrices.  Thus,  is  computed  in  two  different  ways.  Therefore,  the 
formula  (5.26)  can  be  also  used  with  the  updating  procedure. 


5.3.3  Tracking  \\F\\F  and  ||G||F 

For  completeness  we  show  how  to  update  the  ||F’||^,  and  ||G||^-,  in  Table  5.1  al¬ 
though  it  was  partially  described  in  [13].  Here,  we  denote  =  [G^e^t  =  3,4.  Note 
here  that  having  calculated  ||<7^||  and  ||<7^||  to  compute  computing  ||G^||^ 

only  requires  0(  1)  flops.  As  in  tracking  ||£  1 1| j-,,  all  of  the  formulae  in  Table  5.1  also 
work  for  updating  as  well  as  downdating. 


Table  5.1.  Tracking  ll-FIL  and  ||G||  for  the  Improved  ULVD  Procedure 

r  1  r 


Steps 

Updating  Formula 

Flops 

2 

IHaiHHIMBBIHB— 

mMlBaMaMalBBa 

O(k) 

0(1) 

4 

HSiBBiHBaW 

0(1) 

6 

IIMIliHIHaiHIHilBIHillll 
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5.4  Numerical  Examples 

In  this  section,  we  present  a  few  examples  from  numerical  experiments.  These 
tests  were  performed  using  Matlab  on  a  SPARCstation  5  in  IEEE  Standard  double 
precision  with  machine  precision  «  10  .As  in  Chapter  4  the  algorithm  employs  the 
sliding  window  technique  from  signal  processing. 

At  each  step  of  the  sliding  window  method  with  the  window  size  mQ,  an  x  n 
data  matrix  is  constructed  from  an  m  x  n  observation  matrix  A  by  adding  a  new  row  to 
the  data  matrix  in  the  previous  window  and  deleting  the  oldest  row  from  it.  In  step  j, 
the  row  +  j  of  the  observation  matrix  is  added  and  the  row  j  is  deleted,  giving  the 
data  matrix  A ..  The  ULVD  of  the  initial  window  matrix  An,  which  consists  of  the  first 
rows  of  A,  can  be  obtained  by  computing  its  SVD. 

Then  Algorithm  5.1  takes  the  lower  triangular  matrix  (middle  part  of  the  decom¬ 
position),  the  orthogonal  matrix  (right  part)  as  initial  input  and  the  modifying  vector 

T 

r,  and  successively  modifies  these  matrices  at  every  window  step.  The  vector  z  =  V  r 
is  computed  at  the  beginning  of  each  window  step. 

We  tested  our  algorithms  in  the  context  of  the  total  least  squares  (TLS)  problems. 
See  Section  2.7  for  details.  We  use  the  TLS  solutions  from  the  Jacobi  SVD  as  reference 
in  checking  the  accuracy  of  the  solution  and  rank  estimates  of  our  algorithms. 

Fig.  5. 3-5.5  show  the  rank  estimates  by  Algorithms  4.1  and  5.1.  The  horizontal 
axis  represents  the  window  steps  and  the  vertical  axis  the  numerical  rank  of  the  window 
matrix. 

The  distance  between  the  subspaces  is  given  in  the  next  plot  using  the  Definition 

A.  =  F.E.WT,  W.  =  (W..  W.n) 

3  3  3  3  3  3 1  3  2 


2.3.  Let 
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be  the  SVD  of  A  .  computed  by  the  one-sided  Jacobi  method  at  step  j.  Let 


A.  =  U^C^{V^}T,  =  (v!?  v!?),  *=1,2 

3  3  3  3  3  jl  j2 


be  the  ULVD  of  A .  computed  by  Algorithms  4.1  and  5.1,  respectively. 

(i) 

Note  that  here  we  are  discarding  U\  ,  i  =  1,2.  Finally,  let 


C(*)  =  Y (*)E(.*){w(i)}T,  W®  =  (W^  i  =  1,2 

3  3  3  1  3  1  3  K  3 1  3  2  ' 


ft)  — 

by  the  SVD  of  C\  ’  computed  by  the  one-sided  Jacobi  method.  Define  W .  by 


W®  =  (  W®  )  =  i  =  1,2. 

3  V  jl  j2  ’ 33 


Define  the  angles  between  the  subspaces 


sin  =||{H'«};rF«  ||,  sin  ««  =||  {H'WfvW  ||,  i  =  1,2. 


(5.38) 


The  angles  0^  l  =  1,2  represent,  respectively,  error  between  the  true  noise  subspace  from 

the  Jacobi  SVD  and  the  approximate  one  from  tracking  the  ULVD,  the  approximation 

error  from  the  ULV  decomposition,  and  the  subspace  errors  from  ULV  subspace  tracking. 

We  plotted  log1Q(sin  0^)  in  dashed-dot,  log1Q(sin  0^)  in  solid,  and  log1Q(sin  0^) 

(2) 

in  dotted  line  on  the  vertical  axis  of  the  second  graph,  sin  0 ^  is  the  approximation 
errors  discussed  by  Fierro  and  Hansen  [43]. 
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Finally,  the  TLS  errors 


II*  (SVD)_I(ULVD) 
c  -  - _ 1 _ 1  %  —  12 

l|xfVD)|l  ’  ’ 

are  given  in  logarithm  in  the  last  plot.  Here,  and  ,g(^kVD)^  i  —  l5  2  are  the  TLS 

solutions  using  the  SVD  and  the  ULVD  with  Algorithms  4.1  and  5.1,  respectively.  On 
the  third  graph  of  each  figure  we  plotted  (1  in  solid  and  (2  in  dashed-dot. 

The  following  examples  were  also  used  in  [13]. 

EXAMPLE  5.1.  A,  a  110-by-6  random  matrix,  b,  a  110-by-l  random  vector.  Entries  of  A 

and  b  were  chosen  from  a  uniform  distribution  on  the  interval  (0, 1).  85  randomly  chosen 

—4 

rows  of  (A;  b )  were  multiplied  by  7  =  10  in  order  to  vary  the  rank  of  the  matrix,  and 
—2 

tol  =  10  .  The  window  size  p  used  was  12. 

g  0 

Example  5.2.  Same  as  Example  1  except  that  7  =  10  and  tol  =  10 

5 

Example  5.3.  Same  as  Example  1  except  that  the  matrix  had  an  outlier  of  size  10  at 
(18, 1)  position. 

The  first  plot  shows  that  both  algorithms  estimated  the  numerical  ranks  correctly 
throughout  the  sliding  window  steps  in  spite  of  frequent  rank  changes.  Thus  the  rank 
estimates  from  both  algorithms  and  those  by  the  Jacobi  SVD  were  identical.  As  expected, 

(t) 

the  errors  in  the  TLS  solution  are  almost  exactly  the  same  as  the  size  of  sin  ,  i  =  1,2 
in  (5.38). 

The  second  plot  in  each  figure  shows  that  the  noise  subspace  error  is  very  small 
giving  accurate  TLS  solutions.  The  quantities  in  (5.38)  are  shown  to  be  essentially 
identical  indicating  that  the  subspace  errors  from  our  algorithm  are  from  the  rounding 
errors,  not  approximation  errors. 
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Moreover,  both  algorithms  perform  well  even  when  G  becomes  singular  (indicated 

by  for  Algorithm  4.1  and  for  Algorithm  5.1  in  the  first  plot).  We  tested  several 

other  examples,  and  these  results  were  typical. 

As  in  [13]  we  used  the  Corrected  Seminormal  Equation  (CSNE)  technique  when- 

2  2 

ever  the  downdating  is  not  possible,  namely,  ||a||  >  1  in  (4.5),  and  /3^+  /3^>  1  in  (5.6). 
We  indicated  the  time  steps  where  the  CSNE  was  used  with  for  Algorithm  4.1  and 
for  Algorithm  5.1. 

For  Example  1,  we  plotted  the  norm  estimates  by  Algorithm  5.1:  ||X  ^  [|  computed 
by  the  SVD  (in  solid),  R  of  Theorem  5.1  (in  solid  dot),  and  ||X  1||^,  computed  by  the 
Algorithm  5.1  as  described  in  section  5.3.2  (dotted),  again  on  a  log  scale,  relative  to  tol. 
This  verifies  the  bound  from  Theorem  5.1. 


Fig.  5.4.  Example  5.2 


6 
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Chapter  6 


Modifying  the  Singular  Value  Decomposition 


6.1  Introduction 

We  discuss  methods  for  updating  and  downdating  the  SVD  and  partial  SVD  of 
A  £  7£mxn  of  the  form  (1.2)  and  (1.3).  Throughout  this  chapter  we  use  B  in  place  of 
M  in  (1.1)  to  denote  diagonal  form  or  partially  reduced  bidiagonal  form. 

Unlike  the  Jacobi-type  SVD  updating  procedures  [79,  80,  81],  we  transform  up¬ 
dating/downdating  problem  into  a  problem  of  finding  a  bidiagonal  matrix  B  and  an 
orthogonal  matrix  V  such  that 

BTB  ±  zzT  =  VBTBVT  (6.1) 


where  z  is  defined  in  1.9).  Throughout  this  chapter  the  bidiagonal  matrix  B  has  the 
form 

/ 

0  0 
0 


B  = 


7j  <t>l  0 


\ 


0  \  ^2 


0  • 

0  0 


-y  d>  0 

'n— 2  ^n—  2 

7_  i  <t>„  ■, 

n— 1  n— 1 


0  7 


»  I 
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We  may  also  use  the  MATLAB-like  shorthand 

B  =  bidiag(7(l:n),<£(l:n-  1)) 

to  denote  the  above  bidiagonal  matrix.  As  in  modifying  the  ULVD  from  the  previ¬ 
ous  chapters,  our  approaches  to  downdating  the  decompositions  use  ideas  from  chasing 
algorithms  [1,  93, 115,  125]  and  from  the  downdating  algorithm  due  to  Saunders  [46,  85], 
The  following  are  the  main  results  of  this  chapter: 

•  Procedures  for  updating  and  downdating  the  SVD  which  obtain  bidiagonal  forms 
such  that 


l 


n—l 


B  = 


/  -  t\ 
bi 


0 


l 

n—l 


,  T  \ 

Viei 


^  Vl 


(A) 


(6.2) 


where  B  and  B  are  upper  bidiagonal,  and  l  =  k  +  1  for  updating  and  l  =  k 
for  downdating.  This  form  preserves  more  of  the  accuracy  of  the  small  singular 
values  and  is  not  achieved  by  standard  chasing  procedures.  We  can  then  use  one  of 
several  algorithms  to  find  the  singular  values  of  the  bidiagonal  matrix  B  to  relative 
accuracy  [11,  35,  40].  The  singular  vector  matrix  can  be  modified  by  a  procedure 
due  to  Gu  and  Eisenstat[54]  in  0(mn )  operations  (the  constant  on  mn  depends 
upon  machine  precision).  That  is  the  same  order  of  complexity  as  for  the  ULVD 
methods  with  similar  stability  properties. 

•  A  perturbation  theory  for  the  singular  subspaces  from  modified  matrices  and  block- 
wise  error  bounds  for  the  above  procedures. 
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The  condition  (6.2)  is  achieved  because  the  algorithms  for  both  updating  and 
downdating  problems  produce  an  orthogonal  matrix  V  that  has  the  form 


k  n—k 


V  = 


'\x  0  ^ 

V  °  '22  / 


k 

n—k 


(6.3) 


There  is  never  a  rotation  of  the  first  k  columns  of  B  with  the  last  n  —  k. 

In  the  following  section  we  describe  the  secular  equation  approach  as  an  alterna¬ 
tive  to  chasing  algorithm.  We  show  that  this  approach  can  fail  to  separate  the  singular 
values  in  separate  blocks.  In  Section  6.3  we  give  some  basic  chasing  procedures  for 
modifying  the  SVD,  and  our  chasing  procedures  which  have  the  property  (6.2).  Section 
6.4  gives  the  perturbation  theory  and  discusses  error  analysis.  Section  6.5  gives  some 
computational  examples. 


6.2  Secular  Equation  Approach 

The  alternative  to  chasing  algorithms  for  modifying  the  SVD  is  that  of  finding 
the  zeroes  of  a  particular  spectral  function  [10,  25,  49,  53,  54,  67,  97], 

”  z2 

m  =  i+«E  V-^  =  °  (6-4) 

i= 1  ai  ~  ° 

where  a  >  0  for  updating  and  a  <  0  for  downdating,  and  a  is  the  singular  value  of  the 
modified  matrix.  The  corresponding  singular  vector  is  given  by 

{B  B  -  <7  Jn)  z 
UBTB-92In)-'z\\ 


v  — 


(6.5) 
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Here,  we  assume  that  B  is  diagonal.  That  approach,  as  yet,  does  not  allow  us  to  separate 
the  singular  values  into  separate  blocks  as  shown  in  the  following  example. 


Example  6.1.  Let 


B  = 


1  0 


0  1-10 


-10 


1  1-10 


-10 


QR  decomposition  B  is  given  by  B  —  QR  where 


Q  = 


l  -7.0711 -10-1  4.0825  •  10_1  -5.7735  •  10-1  ^ 

0  -8.1650 -10-1  -5.7735  •  10-1 

-7.0711  •  10-1  -4.0825  •  10-1  5.7735  •  10-1 


1  -1.4142  -7.0711-10  11  ^ 


R  = 


0 

0 


-1.2247-10 
0 


-10 


The  SVD  of  R  is  given  by  R  =  USV  where 


V  = 


-1.0000 


—21  \ 
4.3301-10  z  0 


-4.3301-10  21  -1.0000 


0 


0 

1.0000 


5  = 


1.4142  0 
0  1.2247-10 

0  0 


-10 


,  v  = 


-5.0000  •  10 


-11  \ 


5.0000-10  11  1 


Ill 


are  wrong! 
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6.3  Ordinary  Chasing  Algorithms 
6.3.1  Basic  Chasing  Routines 

In  this  section,  we  present  an  example  of  an  orthogonal  chasing  scheme  that 
produces  orthogonal  matrices  U,V  £  7Znxn  such  that 

B  =  UT  BV ,  VTz  =  pev  p=\\z\\  (6.6) 

where  B  is  lower  bidiagonal.  For  the  4x4  case,  it  is  given  in  Fig.  6.1.  Here  r  and  x 
denote  possibly  large  elements  and  e  and  y  denote  small  elements.  See  Algorithm  3.4 
for  the  formal  description. 

In  theory,  this  could  be  used  to  produce  an  updated  or  downdated  bidiagonal 
matrix  very  easily.  We  have 


—  rP  rP  rP  —  ^  n  rp  _rp  _ 

V1  ( B 1  B  ±  zz1  )V  =  B1  B  ±  =  B1  B 


If  B T  =  bidiag(7(l:  n),^(l:  n  -  1)),  then  =  bidiag(7(l:  n),  <^>(1:  n  —  1))  is  identical 

_  fZ 2  2 

to  B  except  that  7  =  W7  —  p  .  This  is  illustrated  in  the  last  rotation  in  Fig.  6.1, 

denoted  by  a  pair  of  For  updating  it  is  simply  a  Givens  rotation.  It  should  be 

noted  for  downdating  that  the  assumption  (7^  >  p  is  equivalent  to  the  assumption  that 
T  T 

B  B  —  zz  is  positive  semi-definite. 

Such  a  procedure  shown  in  Fig.  6.1  does  not  preserve  the  separation  of  subspaces 
for  large  and  small  singular  values  as  accurately  as  we  would  like.  Large  elements  can 
get  chased  down  into  the  lower  part  of  the  bidiagonal  matrix  B  as  shown  in  the  following 
example. 
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11  11 


Fig.  6.1.  Ordinary  Chasing  Procedure 
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Example  6.2.  Suppose  we  have 

B  =  diag(0.2071, 1.510  •  10-3, 8.081  •  10-4, 6.383  •  10-4, 5.184  •  10-7) 

2  =  (7.964  •  10-3, 8.012  •  10-3,  -9.102  •  10~3,  -2.821  •  10~3, 1.607  •  10~2)T 


After  forchase  is  applied  to 


\ 

/ 


we  obtain  B 


bidiag(7(l:  5),  <^(1: 4))  where 


(6.7) 


7(1:5)=  (7.864-10  2, -5.357 -10  2, -1.317 -10  3, -7.283 -10  4,  -6.414  -10  4)T 
<£(1:  4)  =  (-0.1852,4.134  •  10-5,  -4.030  •  10-4, 8.115  •  10_5)T 


Here  7(5)  >  tol,  so  that  the  smallest  singular  values  may  be  overestimated.  In  fact,  there 
should  be  no  rank  increase  as  we  will  see  in  Example  6.3. 

In  the  next  section,  we  show  that  forchase  and  backchase  procedures  described 
in  Section  3.2.1  can  be  combined  in  a  fashion  that  allow  us  to  update  or  downdate  the 
SVD  more  accurately. 

6.3.2  The  Updating  Algorithm 

Let 

B  =  diag(7(l:  n)),  7j  >  •  •  ■  >  7t  >  «  >  7jb+1  >  •  •  *  >  7„- 
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We  partition  B  into 


B1  =  diag(7(l:  k)),  B2  =  diag(7(fc  +  1:  n)). 


Let  2  be  defined  in  (1.9).  Then  the  following  algorithm  updates  the  diagonal  matrix  into 
upper  bidiagonal  matrix. 

Algorithm  6.1  (Procedure  for  updating  diagonal  matrix).  Given  input  7(1:  n) 
that  contains  cr^A),*  =  1, ...,n  and  the  update  vector  z,  this  procedure  produces  the 
updated  bidiagonal  matrix  B  =  bidiag(7(l:n),<£(l:n-  1)).  We  also  input  k  the  number 
of  singular  values  greater  than  tol. 

Step  1.  Compute  orthogonal  matrices  E  TZkxk  and  E  T^n  ^)x(n  k) 

such  that 

U^BlV  =  B{^\  VlX  =  Plef\  Pj  =  ||*||  (6.8) 

U2B2V2  =  B{^\  V2y  =  p2e^~k\  p2  =  ||y||  (6.9) 

where  is  upper  bidiagonal,  and  B^  lower  bidiagonal. 

1  Z 

Step  2.  Let  Q  =  J(  1,3,0  )  and  Q  -  J(2,3,0  )  define  Givens  rotations  for  some 

1  1  z  z 

6.,i  =1,2  such  that 
1 


/ 


.0) 


0 

7(1)  , 

7*+l  / 


/  .(2)  a(2)  \ 


'k 

0 


h 

y(2) 

Tik+1 

0 
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that  is,  use  Algorithm  3.8  for  updating  2x2  matrix, 

i42)-42)'-0 = up22(41>’°'4+i’',r'>2)' 

Thus  if  we  let  U  =  J(k,n+  1,^)  J(k  +  1  ,n  +  M2),  then  we  have 


„(2)  <(2)  T 

B I  4  Vi 

>(2) 


0  B, 
0 


1 

*  B1^1  0  ^ 

E-H  CO 

to 

II 

0  b<2> 

T  T 

) 

kVt  Vl  / 

Step  3.  Use  Algorithm  3.6  to  construct  an  orthogonal  matrix 
such  that 


(« 


B  = 


'  Bi 

h] 

ll 

(l 

k 

\ 

0 

° 

° 

°4> 

\ 

<^e  e 
^1  9k  ek€\ 


0  B, 


(2) 


where  B ^  is  upper  bidiagonal.  Thus  U  and  V  are  given  by 


'l  0  0  ' 


u  = 


0  Ul  0 


o  o  U. 


u„ 


2  / 


f  I.  o  oN 

k 

0  UA  0 

4 

0  0  1 


V  = 


n  0 


0  vr 


2  / 


— fc)x(n— fc) 


(6.10) 


Note  that  after  Step  1  the  updating  problem  is  reduced  to  a  2  x  2  problem.  Step 
3  is  to  restore  the  matrix  to  upper  bidiagonal  form,  which  is  equivalent  to  a  step  of  qd 
procedure.  The  block  diagonal  form  of  V  in  (6.10)  is  highly  significant.  Any  modification 
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of  the  subspaces  associated  with  the  first  k  singular  values  and  the  last  n  —  k  singular 
values  will  be  computed  in  the  reduction  of  the  bidiagonal  matrix  B. 

Fig.  6. 2-6.3  show  the  reduction  steps  for  this  algorithm  with  n  =  7  and  k  =  4. 
Here,  a  pair  of  —>  denotes  the  application  of  Givens  rotations  that  corresponds  to  Step 
2.  Unlike  the  ordinary  chasing  algorithm,  Algorithm  6.1  preserves  the  block  structure 
as  illustrated  in  the  following  example. 

Example  6.3.  When  Algorithm  6.1  is  applied  to  (6.7),  we  obtain 

7(1: 5)  =  (6.441  •  10-4, 9.869  •  10-4, 3.974  •  10-3, 5.009  •  10~2, 3.818  •  10-7)T 
<j>{  1: 4)  =  (1.054  •  10-4, 7.309  •  10-4, 0.1862,  -4.444  •  10_8)T 

keeping  the  separation  of  the  large  and  small  blocks,  and  so  preserving  the  accuracy  of 
the  tiny  singular  values. 

6.3.3  The  Downdating  Algorithm 

An  immediate  difference  between  the  updating  and  downdating  procedures  is  that 
we  write  B  in  the  form 

B  =  d\*g(BvBrOn_p)  (6.11) 

where 


B1  =  diag(7(l:fc))=  diagOTj,...,^), 

B2  =  diag(7(fc  +  l:p))  =  diag((Tjb+1,...,<rp), 


>  e  >  a 


k+V  p+ 1 


and 
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Fig.  6.2.  Bidiagonal  Reduction  Steps  for  Modifying  the  SVD 
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Fig.  6.3.  2x2  Updating/Downdating  Steps  and  a  qd  Step 
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thus  allowing  for  the  possibility  that  some  singular  values  are  exactly  zero.  As  in  down¬ 
dating  the  ULVD,  we  partition  the  vector  z  in  the  form, 

T 

z  =  Vr  = 

Here,  is  presumed  to  be  the  result  of  rounding  errors  and  is  ignored.  However,  if 
iw  »  H  *  cr^A),  then  we  should  not  downdate,  we  should  refactor. 

T 

Even  when  there  are  no  zero  singular  values,  and  even  though  z  =  V  r  and  r  is 
T  T 

the  first  row  of  A  in  (1.1),  B  B  —  xx  is  positive  semi-definite.  However,  occasionally, 

T  T 

even  that  is  not  the  case.  The  usual  way  to  test  if  B  B  —  xx  is  positive  semi-definite 
is  to  solve 

B^s  =  x.  (6.12) 

If  ||s||  >  1,  then  we  cannot  downdate  B  by  x.  One  possible  remedy  is  to  try  to  obtain 

T  T 

a  better  value  for  x  =  V  A  e^.  That  can  be  done  using  the  corrected  semi-normal 
equations  (CSNE)  [18,  20]  as  we  have  used  for  modifying  the  ULVD  in  Chapter  4. 

If  |js||  >  r\  where  j?  <  1,  then  solve 

Bxc  =  s.  (6.13) 

It  should  be  noted  that  c  solves  the  least  squares  problem 

min  \\AV  c-ex\\ 

c£Hk 


/  \ 
X 


1  # 


k 

p~k 

n—p 
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and  its  value  can  be  improved  by  the  iterative  improvement  steps 


r  =  el~  Avlc 

(6.14) 

6x  =  V^ATr 

(6.15) 

bJ'Ss  =  6x 

(6.16) 

i  +  6s,  x  4—  x  +  8x 

(6.17) 

At  this  point,  if  ||s||  >  1,  we  signal  that  downdating  is  not  possible,  and  thus  other  options 
should  be  considered,  such  as  refactoring  or  choosing  a  higher  threshold  e.  Otherwise, 
the  algorithm  proceeds  in  a  similar  manner  to  Algorithm  6.1. 

We  now  present  the  downdating  algorithm. 

Algorithm  6.2  (Procedure  for  downdating  diagonal  matrix).  Given  the  input 
7(1:  n)  that  contains  <r.(A),  i  =  1, . . . ,  n  and  the  update  vector  z  of  the  form  (4.3),  this 
procedure  produces  the  downdated  bidiagonal  matrix  B  =  bidiag(7(l:n),^>(l:n  —  1)). 
We  also  input  k  the  number  of  singular  values  greater  than  tol.  yQ  is  ignored  unless 

llv0ll  >  M- 

Step  1.  If  ||s||  >  r)  where  s  is  defined  in  (6.12)  and  rj  <  1,  then  solve  (6.13)-(6.16) 
for  6s  and  6x.  Update  s  and  x  as  in  (6.17).  If  ||s||  >  1  or  ||i/q||  >  tol  «  100  *  n, 
then  quit  and  exit;  otherwise,  do  Steps  2-4. 

Step  2.  Same  as  Step  1  of  Algorithm  6.1. 


Step  3.  Compute 


(6.18) 

(6.19) 
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We  can  then  find  two  3x3  Givens  rotations  Q ^  =  .7(1,  *  +  1, 0  ),  i  =  1,2  such  that 


(a\ 


Q  1^2 


=  e. 


\  a2  ) 


In  that  case  we  modify  the  k- th  and  (k  +  l)-st  rows  of  the  matrix 


-,(2) 
7 k 

42M 

f  7(1) 
7Jfc 

0 

\ 

0 

7f+i 

-  <3^2 

0 

7(1) 

7ik+l 

V  pl 

J 

^  0 

0 

/ 

where  p 2  =  7^j\/l  —  c*2.  This  can  be  done  by  using  Algorithm  3.9, 


Thus  if  we  let  =  J(l,k  +  2, 6^ J{  1,  k  +  1, 6^ ,  then  we  have 


Step  4.  Same  as  Step  3  of  Algorithm  6.1. 

We  note  that  in  (6.18)  |a  |  =  ||f?“1x||  =  ||s||.  Thus  B }  can  be  downdated  by  x  if  and 
only  if  laj  <  1.  For  Algorithm  6.2,  we  assume  that  is  the  case.  If  a  =  0,  but  a 2  ^  0, 
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then  =  0,  whereas  if  a  =  =  0,  then  7^  =  <f>^  =  0.  Fig.  6.2-6. 3  show  the 

reduction  steps  for  this  algorithm  with  n  =  7  and  k  =  4,  but,  this  time,  a  pair  of 
denotes  the  application  of  rotations  of  Saunders’  algorithm  that  corresponds  to  Step  3. 

Thus  we  have  simple  algorithms  to  perform  either  an  update  or  downdate.  The 
downdating  procedure  has  the  following  consistency  property  similar  to  Proposition  4.2. 


Proposition  6.1.  Assume  that  Algorithm  6.2  is  done  in  exact  arithmetic,  that  U  and 

W  _  /«W/ 

V  in  (4-1)  are  exactly  orthogonal,  that  U  =  UU  satisfies  Ue^  =  e  ,  and  that  z  =  V  r 
is  computed  exactly.  Also  let  V  =  VV  and  z  =  V  r  —  +  ^2ejfc+l‘  ^ 


A  +  6  A  = 


rT  +  SrT ' 

=  V 

\ 

B 

\  A  +  6A  j 

<°  ) 

VT, 


then 


A  +  SA0  = 


(  T  \ 

T 

y  A  +  6  A  j 


-  U 


(  -T\ 

Z 

B 

\  0  / 


Thus  ||M0||  =  ||«^||  <  ||M||. 
Proof.  We  have  that 


~T  ~ 

U1  AV 


"1%  0*1 

B 1  h€i A 


B 


2  ) 


(6.20) 


(6.21) 


(6.22) 


V 


0 
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and 


UT 

B  1 

V  = 

Thus  from  (6.20),  ||£r||^  = 

6p\  +  b\-  « 

(  _T  \ 

H 

2 

1  Pl€k  (p2  +  Sp2^ 


B, 


hekei 


(6.23) 


\B  / 


V  = 


/  T  \ 
r 


A  +  6A 


(6.24) 


=  (o  sat). 


The  result  immediately 


Thus  comparing  (6.24)  with  (6.21),  we  have  6AQ 
follows.  □ 

We  note  that  Proposition  6.1  is  merely  a  consistency  property.  What  it  says  is 
that  approximation  used  in  Step  3  of  Algorithm  6.2  does  not  increase  the  error  over  that 
caused  by  assuming  that  Ue  —  ey 


6.3.4  Extensions  to  Partially  Reduced  Bidiagonal  Forms 

Algorithm  6.1  and  6.2  can  be  easily  extended  to  the  case  where  either  B  or 

is  bidiagonal  as  long  as  they  are  decoupled.  We  need  only  modify  Step  1  of  Algorithm 

6.1  and  Step  2  of  Algorithm  6.2.  Van  Huffel  and  Park  [115]  describe  chasing  algorithms 

_  _  ib  x  ib 

that  given  B  upper  bidiagonal,  produce  orthogonal  matrices  t/j,  E  72.  such  that 


t7T 

F1  1  =  “l  ek 


where  B  is  upper  bidiagonal. 

Fig.  6.4  shows  how  such  an  algorithm  would  work  on  a  5  x  5  example.  Thus, 
using  algorithms  such  as  the  zero-shift  QR  [35]  or  the  qd  algorithm  [40],  it  is  possible  to 
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1  1 


1  1 


1  1 


l  1 


Fig.  6 ;4 .  Reduction  Steps  for  the  Partially  Reduced  Bidiagonal  Form 
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find  the  singular  values  that  are  below  a  certain  threshold,  and  thus  obtain  a  partially 
bidiagonal  matrix  of  the  form  (1.3). 

6.4  Error  Analysis 

6.4.1  Error  Bounds  for  Blockwise  Algorithms 

We  now  present  error  bounds  for  the  process  of  one  update  or  downdate  using  the 
procedures  in  Sections  6.3.2  and  6.3.3.  All  of  the  matrices  below  are  computed  except 
those  with  S  in  front  of  them. 

The  following  two  propositions  are  proven  in  the  Appendix  of  [14]. 

Proposition  6.2.  Algorithm  6.1  produces  an  updated  matrix  B  such  that  for  some  or¬ 
thogonal  matrix  U ,  and  V  we  have 


y  \ 

\ 

£/T 

z 

V  = 

0 

\  B  ) 

^  £  +  SB  +  SB0  . 

where 


SB  =  diag((5B1,^B2) 

6S0  =  +  **k+ iVw  +  Ww 

|| SB^Kpf^n)  WB^  +  Oip2) 

\\SB2\\  <  m/2(»)  \\B2\\  +  0{p?)  =  pf2(n)  ok+l(B)  +  0(fi2) 
\h-\  <  m/3(«)  1 7;-|  +  0{p2),  j  =  k,  k  +  1 


Nt+1l  <  m/4(»)  l^+1l  +  ^(m2) 


where  f.(n )  =  0{n),  i  =  1,2,  and  f.{n )  =  G(n),  i  =  3,4. 
2  2 
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From  [11,  35]  this  relative  change  in 
only  relative  changes  in  the  singular  values. 

Proposition  6.3.  Algorithm  6.2  produces 
orthogonal  matrices  U  and  V  we  have 


the  entries  of  the  bidiagonal  matrix  makes 
Thus  this  update  procedure  is  very  stable. 

a  downdated  matrix  B  such  that  for  some 


UT 


'  B  +  SB  ^ 

'  ( z  +  Szfv  ' 

K  0  ! 

V  = 

V  B  ) 

where 


SB  =  diag  (6Bv6B2) 

H«yi<M/5(n)  11^11  +  <V) 

Il^jjll  <  m/6(»)  P2II  +  <V)  =  m/6(»)  °k+1(B)  +  <V) 

\\Sz(l:k)\\<fif7(n)  ||z(l:fc)||  +  <V) 

\\6z(k+l:p)\\  <  M/g(n)  \\z(k  +  l:p)||  +  0(p2) 

where  f-{n)  =  0(n2),  i  =  5,6,  and  f.(n )  =  0(n),  i  —  7,8. 
z  z 

These  results  are  as  good  as  can  be  expected  for  any  such  procedure.  As  we 
state  in  the  next  section,  we  can  expect  sharp  separation  between  singular  subspaces 
associated  with  large  and  small  singular  values. 

6.4.2  Perturbation  Bounds  for  Invariant  Subspaces 

We  consider  in  this  section  the  effects  of  the  bounds  in  Propositions  6.2  and  6.3 
in  the  error  in  certain  invariant  subspaces  of  B  resulting  from  Algorithms  6.1  and  6.2. 
Two  perturbation  results  show  that  we  expect  that  the  subspaces  for  large  and  small 
singular  values  will  be  very  accurately  computed. 
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The  componentwise  backward  error  SBQ  in  Proposition  6.2  has  a  very  small  effect 
on  the  error  in  the  subspaces.  The  following  result  is  given  for  completeness. 

Proposition  6.4  ([39,  Lemma  4.5]).  Let 


B  =  bidiag(71,...,7B;^1,...,^n_1) 


(6.25) 


and  let 


B  --  bidiagf V] ..... W an+1*, . 


(6.26) 


Let 


2n— 1 

v-  n  max{ajiaj  /-i- 

2  —  1 


Let  , . . . ,  w '  be  the  right  singular  vectors  of  B  and  letw ^ , . . . ,  w :  be  the  right  singular 
vectors  of  B.  Let  cr^ , . . . ,  cr  6e  the  singular  values  of  B  and  define 


pi 


I  k,  -  -l  1 

=min|2,^_r|,  i 


=  1, 2, . . ., n. 


Let  =  (ipj, . . .,  Wj  j ,  ti;  . . . ,  wn)i  Mat  is,  the  right  singular  vector  matrix  of  B 


with  its  i-th  column  deleted.  If  p.  >  r/.  then 


<  (2<l±2l  +  ?). 


(6.27) 


Thus  the  effect  of  the  relative  errors  SB^  on  the  updated  matrix  B  is  minimal 
and  has  little  effect  on  the  singular  subspaces. 
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Proposition  6.3  implies  that 


II«,II<«bII5,||  +  0((*2) 

(6.28) 

l|02ll<<fie  +  0(w2) 

(6.29) 

IM1:*)II<‘JW1:*)II  +  O0‘2) 

(6.30) 

IM*  +  i=p)ll  <  4JW'=  +  1:p)I!  +  o(p2)- 

(6.31) 

Here,  <5D  <  fD(n)p  and  6  <  f  ( n)p  where  fn{n)  =  0(n2)  and  /  (n)  =  0(n). 

jl>  Z  Z  ±5  Z 

PROPOSITION  6.5.  Let  B  and  B  +  SB  be  diagonal  matrices  such  that 


n—k 


n—k 


B  = 


( -  -  t\ 

B1  vr 


v 


SB  = 


) 


n—k 


8B1  0 

0  SB„ 


n—k 


where 


110,11  <  «BP,||  +  o(„\  ||02||  <  +  0(m2), 


( w?  ^ 


<  € 


let  a  >  ■  •  ■  >  a,  >  e  >  a.  ,>•••>  a  .  Let  W,  W  e  72.nx”  6e  the  matrices  of  right 
1  ~  k  Ar+1  ”  n 

singular  vectors  of  B  and  B  +  SB,  respectively.  If 


W  =  (W1  w2),  W  =  (Wl  w2). 


(6.32) 
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where  W.,W.£  Unxk ,  and  W  ,  W0  g  7lnx(n  k\  then 

XX  Zd  /d 


\\wFw  \\f  <  26  \\B  ||  ||5  1,^1+/<  +  °(¥B\\2). 

1  *  *  ak~  ak+l 


(6.33) 


Proof.  Let  w.,  i  =  1,2, . . ., k  be  the  t-th  right  singular  vector  of  B  +  SB  and  let 
w  ,  j  =  k  +  1, . . .,»  be  the  j-th  singular  vector  of  B.  Then  from  standard  perturbation 

-T  - 

bounds  on  the  eigenvectors  of  B  B  we  have 


\wTsB^Bw.  +  wT B^ SBw  .\ 

=  Li - 4—4 - L  +  o(ll«fill2) 


2  2 

(7.  —  <7. 
z  2 


(6.34) 


\\6Bw.\\(r.  +  \\6Bw  \\a 


i  3 


3  * 


2  2 

<7.  —  cr. 
*  3 


+  0(\\6Bf) 


(6.35) 


We  now  bound  ,||  for  i  =  1,2, . .  .,n.  First,  let 


(1) 


w: 

w.  =  I  1 

'  1  „(J>  I 

i 


(6.36) 


Then  we  have 


^  9 

^  0  B 


fii  Vi 


'  t  J?)  ^ 


2  / 


(2) 

\V/ 


=  cr. 


/  yO)  \ 

(2) 

\  Vi  ) 


—  a.y. 


where  y.  is  the  corresponding  left  singular  vector.  Thus  we  have 

i 


w,0)  =  Bi  '(v!1’  -  Vrf ’>• 


Therefore, 
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ll^ll  =  II5!  1|l(^-+  l^+1l)  <  II5!  +  «)•  (6.37) 


Now  we  can  say  that 


which  leads  to 

i!«»  ii  <  <bp,ii  isr'iK^ + «>2 + eV- 


Equation  (6.38)  leads  to 


l|W«'ill<<B(»j  +  V2<)||S1||p-I||. 


(6.38) 


(6.39) 


Combining  (6.35)  and  (6.39)  yields 


\~T 


\wiw\<6B\\B1\\\\B1  || 


_1  (*,-  +  ^  <0  <Tj  +  (*•  +  \/2  e)  (T. 


nl|2\ 


2 

a .  — 

i 


+ o(\mn 


which  is  bounded  by 


<  «BII*>1II  Pr1||(y_Vfg)  +  0(||«fi||2).  (6.40) 

*  j 

Thus  for  all  i  =  1, 2, . . . ,  k  and  j  =  k  +  1, . . . ,  n,  we  have 

||J5i  !||  A±i_ - +  0(\\SB\\2).  (6.41) 

J  k  k+ 1 
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which  is  now  independent  of  i  and  j.  The  use  of  the  Frobenius  norm  on  W  =  (w^, w k) 
and  W2  =  (wk+v  ...,wn)  yields  (6.33).  □ 

Proposition  6.5  implies  that  the  updating  algorithm  will  always  yield  accurate 
subspaces  for  the  first  k  and  last  n-k  singular  values.  For  Algorithm  6.2  we  must  also 
bound  the  effect  of  6z  which  is  qualitatively  slightly  different. 

-T  - 

Proposition  6.6.  Let  B  and  W  be  an  in  Proposition  6.5  and  let  it  satisfy  B  B  = 
VB? BVT  —  zzF  where  V  €  7 inxn  is  the  orthogonal  matrix  from  Algorithm  6.2.  Let  B 
satisfy  BT B  =  VBT BVT  -  (z  +  Sz)(z  +  Sz)T  and  let  6z  have  the  form  (6.30)-(6.31). 
Let  z  =  ( x' T  yT)T ,  x  6  vA ,  y  E  and  assume  that  ||y||  <  e.  Let  W  E  1ZnXn  be  the 

matrix  of  right  singular  vectors  of  B  and  define  and  from  (6.32).  Then 

\\wfw2\\F  <  6zy/H^T).uR\\Bi\\  p-'Hx 

+  0(\\6zf)  (6.42) 

where 

u  =  \\B~Tx  II,  k  =11^11  ||  B~l\\  (6.43) 


a_k  +  3€  ! 

ak'ak+ 1  VV! 


Proof \  Let  w.,  i  =  1,2, . . .,  k  be  the  i-th  right  singular  vector  of  B  +  SB  and  let 
w j  —  k  +  1, . .  .  ,n  be  the  j- th  singular  vector  of  B .  Then  from  standard  perturbation 

-T  - 

bounds  on  the  eigenvectors  of  B  B  we  have 


i  ~T  , 

\W.  W  A 
1  i  3 


+  o(IMI2) 


(6.44) 
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< 


- 1 - h - 2 - - - L  +  0(11^11  )■ 

a .  —  o . 

i  j 


(6.45) 


If  we  use  the  partitioning  of  w.  in  (6.36),  then  we  have 


\zTw.\  =  | xTw^\  +  |/^2)|  =  \xTB  +  \yTwf\ 


which  means  that 


\z^w.\  <  ||J5  ^a:||  \\B  w^||  +  e  <  u  a.  +  c,  i=l,...,».  (6.46) 

2  12  2 


We  also  have  that 


\6zTw.\  <  +  \6yT  wf\  <  ^||®||  ||w^||  +  |M|  ||to^| 


Using  the  facts  that  ||a;||  <  \\B^\\  and  \\y\\  <  e  and  from  (6.37),  we  have 


ifaVi  <<,(11^,11  n>,+o+« 


which  we  simplify  to 


lizVlS^IIBjII  ||B1-1||(<T.  +  2S). 


(6.47) 


Combining  (6.45)  with  (6.46)  and  (6.47)  yields 


r  -  i  ( ai  +  2e)(CT7-  +  €)  +  C0-,-  +  2t)((Tj  +  €) 

\wjw.\  <  Su  lltfjll  \\B~l\\  — - 2 - 1 - 1 - 


2  2 

a .  —  c . 
*  J 


+  0(\\6z\f) 


(6.48) 
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(7  .  - 1-  36 

3 _ 


a.  —  a . 
*  3 


+  o(||MI2)- 


(6.49) 


We  note  that  this  for  i  =  1, 2, . . . ,  k  and  j  =  k  +  1, . . . ,  n,  thus  we  can  bound  (6.49)  by 


\sJwj\  ^  8ZU  WBi 

The  bound  (6.42)  is  obtained  by  computing  the  Frobenius  norm  of  W^W^,  where  = 
(wv...,wk)  and  W2  =  (^+1 , •  •  • , wn).  □ 


ak  ~  ak+ 1 


+ 


46 


2  2 
°k  ak-\- 1 


+  0(\\Szf). 


(6.50) 


6.5  Numerical  Examples 

In  this  section,  we  present  a  few  examples  from  numerical  experiments.  These 

tests  were  performed  using  Matlab  on  a  SPARCstation  5  in  IEEE  Standard  double 

—16 

precision  with  machine  precision  w  10  .As  in  Chapter  4  the  algorithm  employs  the 

sliding  window  technique  from  signal  processing. 

At  each  step  of  the  sliding  window  method  with  the  window  size  mQ,  an  x  n 

data  matrix  is  constructed  from  an  m  x  n  observation  matrix  A  by  adding  a  new  row  to 

the  data  matrix  in  the  previous  window  and  deleting  the  oldest  row  from  it.  In  step  j, 

the  row  +  j  of  the  observation  matrix 
0 

Then  Algorithms  6.1  and  6.2  take  the  diagonal  matrix  and  the  orthogonal  matrix 

(right  part)  as  initial  input  and  the  modifying  vector  r,  and  successively  modifies  these 

T 

matrices  at  every  window  step.  The  vector  z  =  V  r  is  computed  at  the  beginning  of 
each  window  step. 
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The  one-sided  Jacobi  method  [36]  was  used  to  compute  the  SVD  of  the  initial 
window  matrix  which  consists  of  the  first  m  rows  of  A,  and  to  compare  with  our 
algorithms  for  rank  estimation  and  the  accuracy  of  the  subspaces. 

We  tested  our  algorithms  in  the  context  of  the  total  least  squares  (TLS)  problems. 
See  Section  2.7  for  details.  We  used  the  TLS  solutions  via  the  Jacobi  SVD  as  reference 
in  checking  the  accuracy  of  the  solution  and  rank  estimations  of  our  algorithms. 

In  Fig.  6.5-6. 7,  the  rank  estimated  by  our  algorithms  (solid  line)  and  the  true 
rank  (dotted  line  but  not  visible  in  the  plot)  are  given  in  the  first  plot.  The  horizontal 
axis  represents  the  window  steps  and  the  vertical  axis  the  numerical  rank  of  the  window 
matrix. 

Let  and  be  the  right  singular  vector  matrices  computed  by  the  Jacobi 
method  and  Algorithms  6.1  and  6.2,  respectively.  Then,  using  the  Definition  2.3, 

•j  =  ll«'10)TVj(i)l.  i  ~  1.2,... 

where  W ' J ^  W^)  and  =  (vj^  V^)-  We  plot  log  (sin(0.))  in  the 

J.  A  J.  Z  J.  U  J 

second  plot  of  each  figure. 

Finally,  the  TLS  errors 


are  given  in  logarithm  in  the  last  plot.  Here,  x  .  and  x .  are  the  TLS  solutions  using  the 
Jacobi  method  and  our  algorithms,  respectively. 

Example  6.4.  A,  a  100-by-5  random  matrix,  b,  a  100-by-l  random  vector.  Entries  of  A 

and  b  were  chosen  from  a  uniform  distribution  on  the  interval  (0, 1).  75  randomly  chosen 

rows  of  [A;  b]  were  multiplied  by  7  =  10-4  in  order  to  vary  the  rank  of  the  matrix,  and 
—2 

tol  =  10  .  The  window  size  p  used  was  10. 
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The  first  plot  shows  that  our  algorithms  estimated  the  numerical  ranks  correctly 
throughout  the  sliding  window  steps  in  spite  of  frequent  rank  changes.  The  errors  in  tiny 
singular  values  were  relatively  large,  and  our  algorithms  almost  always  overestimated 
small  singular  values.  However,  they  were  close  enough  for  the  correct  rank  estimation. 

The  second  plot  in  each  figure  shows  that  the  noise  subspace  error  is  very  small 
giving  accurate  TLS  solutions. 

—9  —7 

Example  6.5.  Same  as  Example  6.4  except  that  7  =  10  and  tol  =  10 

5 

Example  6.6.  Same  as  Example  6.4  except  that  the  matrix  had  an  outlier  of  size  10 
at  (15, 1)  position. 

Both  TLS  solution  errors  and  the  noise  subspace  errors  show  that  our  algorithms 
give  very  accurate  approximation  to  the  subspaces  under  consideration.  Moreover,  the 
algorithm  performs  well  even  when  some  of  tiny  singular  values  become  almost  zero 
(indicated  by  in  the  first  plot).  We  tested  several  other  examples,  and  these  results 
were  typical. 

Since  our  downdating  procedures  use  LINPACK  downdating  algorithm,  it  is  not 
difficult  to  generate  the  cases  where  the  algorithm  breaks  down  when  ||a||  >  1,  for 
instance,  when  deleting  a  large  row  relative  to  7  (see  Fig.  6.6)  or  a  row  that  contains 
outliers  (see  Fig.  6.7).  We  used  the  CSNE  approach  in  (6.14)-(6.17),  and  indicated  these 
steps  by  ’+’  in  the  first  plot. 

The  CSNE  approach  was  used  in  all  three  examples  and  most  extensively  in 
Example  6.6  when  downdating  a  row  with  an  outlier.  However,  the  performance  of  our 
algorithm  was  less  satisfactory  for  the  larger  outlier. 
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Chapter  7 


Parallel  Implementation 


7.1  Introduction 

In  this  chapter  we  describe  a  fully  parallel  bidiagonal  reduction  procedure  for 
modifying  the  SVD.  In  Chapter  6  we  showed  that  blockwise  algorithms  produced  more 
accurate  subspaces  than  the  ordinary  one-way  chasing  algorithms  which  ignore  the  block 
structure  of  the  diagonal  matrix.  A  VLSI  implementation  of  the  similar  chasing  schemes 
for  the  bidiagonal  reduction  for  updating  was  also  described  in  [1,  117],  but  without 
considering  the  large  and  small  structure  of  the  matrix.  In  this  section  we  implement 
our  algorithm  on  a  distributed-memory  MIMD  multiprocessor.  Two  storage  schemes  are 
considered:  cyclic  storage  scheme  and  consecutive  storage  scheme.  We  will  show  that  the 
consecutive  storage  scheme  implements  the  bidiagonal  reduction  much  more  efficiently. 

The  main  idea  behind  the  Algorithms  6.1  and  6.2  is  to  reduce  the  entries  of  the 
vectors  x  and  y  in  opposite  order  and  to  chase  the  bulge  in  opposite  direction,  upper-left 
corner  for  the  large  block  and  lower-right  corner  for  the  small  block.  This  is  based  on 
the  two-way  chasing  scheme  [125],  which  was  also  used  in  [116]  with  k  =  to/ 2  in  the 
context  of  updating.  The  algorithm  simply  reduces  the  large  and  small  blocks  to  almost 
bidiagonal  form  (see  the  12-th  matrix  in  Fig.  6.2)  by  ordinary  chasing  scheme,  and  uses 
2x2  updating  and  downdating  algorithms  to  eliminate  x,  and  y  ,  followed  by  one  step 

K  1 

of  the  qd  procedure  on  the  small  block  to  reduce  it  to  the  upper  bidiagonal  matrix.  The 
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entire  reduction  steps  for  Algorithm  6.1  requires 


Step  1 


Step  3 


k(k  —  1)  +  (n  —  k)(n  -  k  —  1)  +  (n  —  k  —  1) 
=  r?  —  2  nk  +  2  k^  —  (k  +  1) 


(7.1) 


plane  rotations. 

The  algorithm  allows  simultaneous  bidiagonal  reductions  on  both  large  and  small 

blocks,  B  and  B  (Step  1  in  Algorithm  6.1)  since  they  do  not  share  any  data  throughout 

the  reduction  steps.  Following  similar  notations  used  in  [1],  we  denote  F.  .  as  Givens 

*>  3 

plane  rotations  operating  on  rows  i  and  j  (left  rotations),  and  G .  .  as  those  operating 
on  columns  i  and  j  (right  rotations).  Then  from  the  dependency  graph  of  this  algorithm 
depicted  in  Fig.  7.1,  we  see  that  G  and  G  ,  the  first  rotations  for  each  block,  can 

1,2  71  —  l,Tl 

be  executed  in  parallel,  and  the  sequence  of  the  rotations  that  follows  are  also  carried 
out  in  parallel.  Hence,  the  whole  reduction  only  takes 


ifc-3 

2  +  3  +  -  .  +  3+2 (k  -l)  =  5k-9  if  k  >  (7.2) 

k- 3 

2  +  3  +  —  +  3  +2(n  —  k  —  1)  +  1  =  5n  —  5k  —  8  if  k  <  (7.3) 

time  steps.  Obviously,  the  algorithm  achieves  an  optimal  performance  when  k  ~  n/2. 

7.2  Overview  of  Connection  Machine 

A  Connection  Machine  (CM-5)  system  can  have  up  to  16K  physical  processors 
or  processing  nodes  (PNs).  The  CM-5  has  two  interprocessor  communications  networks: 
data  network  and  control  network.  The  control  network  is  used  for  global  operations 
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such  as  synchronization  and  broadcasting.  The  data  network,  which  uses,  so-called,  4~ 
ary  fat  tree  [72],  supports  operations  for  data  transfers  from  a  single  source  to  a  single 
destination. 

The  CM-5  supports  both  SIMD  (Single  Instruction  Multiple  Data)  and  MIMD 
(Multiple  Instruction  Multiple  Data)  programming  models  [66].  In  the  SIMD  model,  the 
data  parallel  programming  associates  one  PN  with  each  element  of  a  data  set.  All  PNs 
execute  identical  operations,  each  operating  on  data  stored  in  its  local  memory,  accessing 
data  stored  in  the  local  memory  of  other  PNs,  or  receiving  data  from  the  host  computer. 

In  the  MIMD  programming  model,  each  node  has  its  own  copy  of  the  same  pro¬ 
gram  called  node  program,  and  executes  the  program  asynchronously.  The  communi¬ 
cation  between  the  nodes  is  usually  done  by  utilizing  a  set  of  efficient  communication 
routines  contained  in  the  CM  message-passing  library,  CMMD.  For  our  implementation 
we  chose  the  MIMD  model  because  the  rotations  at  each  time  step  are  different  in  terms 
of  their  types  (left  or  right)  and  the  data  required. 

In  CM-5  a  packet  of  size  20  bytes  is  used  for  nodal  communications.  First  four 
bytes  are  used  for  control  purposes  and  the  rest  of  16  bytes  contain  the  data.  If  a 
packet  is  full,  that  is,  if  it  contains  16  bytes  of  user  data,  the  overhead  of  processing  it  is 
smaller  than  the  message  of  different  sizes.  Therefore,  the  communication  overhead  will 
be  smaller  if  a  user  made  the  message  size  a  multiple  of  16  bytes  [90]. 

Moreover,  a  cluster  is  composed  of  four  processing  nodes,  and  the  nodes  with  the 
same  cluster  share  a  common  switching  node  capable  of  four  times  the  bandwidth  of  the 
node  at  the  leaf  level.  A  Similar  statement  is  true  for  the  nodes  as  progressing  toward 
the  root.  Each  node  must  go  through  at  least  one  switching  node  to  communicate  with 
the  other  node.  To  communicate  with  the  node  in  a  different  cluster,  the  communication 
path  will  be  longer.  Hence,  it  takes  longer  to  transfer  the  data  to  the  node  within  the 
same  cluster  than  to  the  node  outside  the  cluster. 
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We  assume  the  CM-5  consists  of  p  processing  nodes,  and  denote  them  as  Node(O), 
. . . ,  NoDE(p—  1).  Here,  we  assume  that  p  is  a  power  of  two.  We  choose  one  of  the  CMMD 
programming  styles,  Host/Node  model,  where  the  host  processor  allocates  the  data  to  the 
nodes,  and  collects  the  results  for  the  analysis.  The  host  program  calls  the  node  program 
residing  in  each  node  for  the  various  tasks,  and  each  node  has  identical  node  program. 
Once  the  node  program  is  loaded  in  each  node,  it  can  be  executed  asynchronously.  Both 
host  and  node  programs  for  our  implementation  are  written  in  FORTRAN. 


7.3  Implementation  Details 

In  this  section  we  give  a  detailed  description  of  parallel  implementation  of  the 
Algorithm  6.1.  First,  we  need  the  following  definition. 

DEFINITION  7.1.  A  pair  of  left  rotations  F.  .  and  F.  or  right  rotations  G .  . 

2,1-pl  J  **7~  1  ZjZ-!”! 

and  G  .  . , ,  is  said  to  be  adjacent  if  \i  —  j I  =  2. 

We  also  use  Lred(»)  to  denote  the  sequence  of  plane  rotations  for  eliminating  x. 
and  restoring  the  resulting  matrix  into  the  upper  bidiagonal  matrix  (large  block),  and 
SRED(i)  to  denote  those  for  eliminating  and  restoring  the  matrix  into  the 

lower  bidiagonal  form  (small  block),  that  is, 


Lred(i')  =  {G 
Sred(i)  =  {G 


i,i+VFi,i+V"'Fl,2'Gl,2}*  1 

f . f  G 

n—i,n—i+V  n— i,n— i+1’  ’  n— l,n’  n— l,n 

i  =  1, . .  .,n  —  k  —  1. 


1 

h 


(7.4) 

(7.5) 


From  the  dependency  graph  shown  in  Fig.  7.1,  we  note  that  Lred(j)  and  SRED(t)  will 
always  start  and  complete  at  the  same  time  step  although  Sred(ti  —  A:  —  1),  the  last 
sequence  for  the  small  block,  will  finish  before  Lred(£  -  1)  when  k  >  n/2.  In  this  case 
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the  qd  step  of  Ft+2MyFi+u+i,-  ■  ,  F„  can  proceed  while  the  large  block  is  being 
reduced. 

There  are  at  least  two  storage  schemes  that  can  be  used  for  implementing  the 
bidiagonal  reduction:  cyclic  storage  scheme  and  consecutive  storage  scheme. 

7.3.1  Basic  Procedures 

In  Table  7.1  we  describe  communication  primitives  for  the  node-to-node  commu¬ 
nications. 


Table  7.1.  Communication  Primitives 


Primitives 

Description 

CMMD  Routines 

send  (nodelist  ;outlist) 

Send  variables  in  outlist  to 
each  node  in  nodelist 

CMMD_send_block 

recv(i;inlist) 

Receive  variables  in  inlist  from 
NODE(i) 

CMMD_xeceiveJblock 

s wap  ( i;  inlist  ;outlist) 

Exchange  variables  in  outlist 
with  inlist  of  NODE(i) 

CMMD .swap 

send_and_recv 

( i,j;inlist;outlist ) 

Send  variables  in  outlist  to 
Node(j’)  and  receive  vari¬ 
ables  in  inlist  from  Node(z) 
simultaneously 

CMMD_send_and_receive 

Note  that  all  of  CMMD  communication  routines  used  are  blocking  version,  that 
is,  each  node  waits  until  it  finish  sending  or  receiving  the  data  without  proceeding  to  the 
next  executable  code.  This  ensures  that  each  node  carries  out  the  rotation  with  correct 


data  as  we  will  see  in  the  next  section. 


7.3.2  Cyclic  Storage  Scheme 


Suppose  we  store  the  data  rowwise,  so  that  Node(i)  contains  all  of  nonzero  entries 

on  the  row  i  such  as  the  diagonal  entry,  and  the  nonzero  entries  created  by  the  chasing 

steps.  Then  we  immediately  observe  that  this  scheme  would  not  give  a  full  parallelism 

among  the  nodes.  For  instance,  at  time  Step  6  in  Fig.  7.1  (sixth  matrix  in  Fig.  6.2), 

G  and  G  cannot  be  executed  in  fully  parallel  fashion  because  Node(2)  has  all 
1,2  3,4 

three  elements  in  the  second  row,  but  (2,3)  entry  is  also  used  in  processing  4,  so  that 
Node(2)  has  to  communicate  with  both  Node(1)  and  Node(3).  In  fact,  any  adjacent 
pair  of  right  rotations  would  cause  similar  difficulties  when  storing  the  data  rowwise. 

However,  this  problem  can  be  completely  avoided  by  storing  the  data  columnwise. 
We  show  this  using  the  following  proposition. 

PROPOSITION  7.1.  Suppose  n  is  even.  Then  if  k  ^  n/2,  the  dependency  graph  for 
Algorithm  6.1  shown  in  Fig.  7.1  can  contain  no  adjacent  pair  of  left  rotations  at  any 
time  step. 

Proof.  We  only  consider  the  reduction  steps  for  large  block  since  the  same  ar¬ 
gument  applies  to  the  small  block.  Let  G .  .  ..  and  G .  n  .  0  be  the  first  rotations  in 
Lred(z)  and  LRED(i  +  1),  respectively.  Then,  we  see  that  of  Lred(i  +  1) 

is  executed  after  completing  the  rotations  G.  .  , F.  , F.  .  .  of  LRED(t),  i.e.,  when 

Z  jZ  j  X  Z  ,Z“r  -I  Z  X  ,Z 

G .  .  of  Lred(i)  is  executed.  So,  their  indices  differ  by  two.  Since  the  reduction  pat- 

Z  1  ,z 

terns  proceeds  as  GFFGFG  ■  •  •  FG,  and  the  indices  for  the  rotations  decrease  by  one 
for  every  pair  of  F  and  G ,  the  indices  of  subsequent  rotations  Lred(i)  and  the  ones  in 
Lred(i  +  1)  differ  by  at  least  two.  Therefore,  it  is  impossible  to  have  F{  /+1  in  Lred(i) 
and  F  in  LRED(i  +  1),  where  \l—j\  =  2  at  the  same  step.  For  LRED(i)  and  Lred(j'), 
where  |i  —  j\  >2,  the  result  is  more  obvious.  □ 


If  n  is  odd,  we  have  the  same  result  regardless  of  the  value  of  k.  In  fact,  if  n 

is  even  and  k  =  n/2,  it  is  possible  to  have  two  adjacent  left  rotations,  namely,  F. 

k—l,k 

and  -f^+1  £_|_2  in  the  same  time  step.  But,  the  reduction  step  in  this  case  will  have  the 
following  form: 


However,  since  there  is  no  data  dependency  among  the  nodes,  they  can  be  executed  in 
parallel.  Therefore,  Node(j)  stores  all  of  nonzero  entries  of  j-th  column  including  the 
entries  of  x  and  y,  that  is,  we  partition  the  set  of  nodes  {0, 1, . .  .,p  —  1}  into 


{0,1,... ,P0-  1}, 


=  {*Vpo  +  1’ 


■iP  -  1} 


(7.6) 


where  p^q  <  k  <  (pQ  +  1)?.  Here,  q  =  n/p. 

Node (j),j  G  P{,  stores  columns  /  <  k  where 


j +  1  =  1  (mod  pQ) 


Node(j),  j  6  P  ,  stores  columns  k  <  l  <  n  where 

S 
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Example  7.1.  Suppose  p  =  8,  n  =  16,  k  =  11.  Then  q  =  2,  P  =  {0,1, 2,3,4}  and 
Ps  =  {5,6,7}. 


j 

0 

1 

2 

3 

4 

5 

6 

7 

columns 

1,6,11 

2,7 

3,8 

4,9 

5,10 

12,15 

13,16 

14 

7.3. 2.1  Chasing  Patterns 

Before  we  consider  designing  the  code  for  the  reduction  steps,  we  need  to  catego¬ 
rize  possible  chasing  patterns.  All  of  the  reduction  steps  for  the  large  block  in  Fig.  6.2 
fall  in  one  of  the  chasing  patterns  described  in  Fig.  7.2. 


II  II 


a 
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x) 

y 

y) 

\y 

0/ 

Fig.  7.2.  Chasing  Patterns 


The  patterns  l^-l*  correspond  to  the  large  block,  and  to  the  small  block. 

Note  that  patterns  l.  is  symmetric  to  s.  simply  because  the  types  of  resulting  bidiagonal 
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matrices  and  the  chasing  direction  for  each  block  are  exactly  opposite.  Here,  b's  represent 
the  diagonal  entries,  a’s  super  or  subdiagonal,  and  ePs  bulges.  Then,  we  see  that  Lred(*) 
and  SRED(t)  have  chasing  patterns,  respectively, 


2(*-1)  2(t— 1) 

^1^2’  V  *4’  l3 ’  l4 ’  ‘ ’  V  V  and  ^  V  V  V  V  V  V  ‘  *  •  ’  V  V' 


Note  that  G ^  has  l i  for  Lred(1)  with  a  undefined  and  l ^  for  Lred(z),  i  >  1.  Similarly, 
G  i  has  for  Sred(1)  with  a  undefined  and  s^  for  Sred(z),  i  >  1. 

7.3. 2. 2  Host  Program 

The  host  program  distributes  the  data  among  the  nodes,  coordinates  the  order  of 
the  reductions,  and  initiates  the  reduction  process.  The  CMMD  routine  CMMD_distrib_ 
to_nodes  provides  efficient  ways  of  allocating  tr.  and  z.  into  the  local  variables  b  and  z 

XX 

of  Node(i). 

Except  for  the  Lred(1)  and  Sred(1),  the  host  program  initiates  the  subsequent 

reductions  at  every  three  time  steps,  i.e.,  Lred(i)  Sred(ti  -  i  +  2)  begin  at  time  step 

3  *  (i  —  1).  This  can  be  done  by  sending  a  signal  to  Node(z)  and  Node(«  —  i )  as  soon 

as  receiving  the  message  from  NoDE(i  —  2)  which  just  finished  processing  the  rotation 

F,  .  . 

«— 2,»— 1 

Upon  the  completion  of  Step  1  of  Algorithm  6.1  by  the  nodes,  the  host  program 
calls  the  subroutines  which  will  perform  2x2  updating  or  downdating  and  one  step  of 
the  qd  process  to  complete  the  bidiagonal  reduction.  This  step  of  the  qd  process  is 
completely  serial  unless  k  >  n/2.  If  it  is  the  case,  Node(&  +  1),  . . .,  and  NoDE(n)  can 
carry  out  the  qd  step  while  the  reduction  on  the  large  block  is  performed. 
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7.3.2. 3  Node  Program 

The  CM-5  allows  a  single  node  program  for  all  the  nodes.  However,  we  need  to 
design  the  code  so  that  a  single  program  can  handle  multiple  chasing  patterns  at  the 
same  time.  Moreover,  each  chasing  pattern  needs  the  data  from  two  or  three  nodes,  and 
these  nodes  must  know  in  advance  what  types  of  operations  to  perform.  For  instance, 
all  patterns  except  for  l  and  s  require  two  nodes  to  do  the  job,  but  l  and  s  need  three 
columns,  and  so  require  three  nodes  to  get  involved  in  the  computation.  To  this  end, 
each  node  keeps  a  local  variable  op  which  is  continually  modified  at  every  time  step. 
The  value  of  op  of  Node(z)  is  determined  by  Node (i  +  1)  for  reducing  the  large  block 
and  by  Node(z  -  1)  for  reducing  the  small  block  as  we  will  see  shortly. 

The  node  programs  for  the  corresponding  patterns  are  given  in  Table  7.2.  A 
unique  value  of  op  is  assigned  to  each  segment  of  the  node  program.  The  value  of 
op  determines  which  operation  each  node  should  perform  at  a  given  time  step,  and  each 
node  executes  only  the  part  of  the  code  marked  by  its  current  value  of  op  . 

The  host  program  ’wakes  up’  the  Node(z)  and  NoDE(n  -  i)  always  with  op  = 
1  and  op  =  11,  respectively,  to  start  Lred(z)  and  Sred(z)  because  they  begin  with 
eliminating  x.  and  y  ,  .  .  For  instance,  let  us  consider  Lred(3)  =  {G  ,  F  .  F  . 

x  n— «— 1+1  3,4  3,4  2,3 

G  ,  F  ,G  },  which  has  chasing  patterns  {/,/,/,/,/,/}.  Then,  Node(3)  receives 

2,3  1,2  1,2  1  2  3  4  3  4 

the  value  of  op  =  1,  and  it  immediately  signals  Node(4)  with  op  -  5  to  carry  out  the 
pattern  l  .  Upon  the  completion  of  /  ,  Node(3)  and  Node(4)  increment  their  values  of 
op  by  one  to  continue  on  to  the  next  pattern  /  .  At  this  point  it  is  not  necessary  for 
Node(3)  to  signal  Node(4)  to  specify  the  types  of  operations.  When  /  is  completed, 
Node(3)  again  increments  its  value  op  by  one  and  signals  Node(2)  with  the  updated 
value  of  op  to  start  the  rotation  F  with  pattern  l  .  Then,  Node(2)  signals  Node(3) 
with  op  =  7  and  Node(4)  with  op  =  9,  and  all  three  nodes  execute  parts  of  the  code 


Table  7.2.  Node  Programs  Using  the  Cyclic  Storage  Scheme 


Pattern 

Node(*) 

Node(j  + 1) 

'i 

/*  op  =  1  */ 

swap(i  +  1;  a,  6,  x;  6,  x) 

formrot(x,x,c,s) 

a  <—  c  *  a 

b  <—  c*b 
d< - s  *  6 

/*  op  =  5  */  _  _ 
swap(i;  6,  x;  a,  6,  x) 
formrot(x,x,  c,  s) 
d  < - s  *  a 

a  s  *  b 
b  <—  c*b 

/*  op  =  2  V 
formrot(6,  d,  c,  s) 
send(z  +  l;c,«) 

/*  op  =  6  */ 

recv(i;  c,  s) 
applyrot(a,6,c,  s,l) 

/*  op  =  3  V 

/*  op  =  7  */ 

'a 

recv(z  +  2;c,s) 

6  <—  c  *  6 
d  *—  s  *  b 

recv(z  +  2;c,  s) 
applyrot(6,  a,  c,  s,  1) 

/*  op  =  4  */ 
swap(i  +  1 ;  a,  6,  <f ;  a,  6) 
formrot(6,  d ,  c,  s) 
a  <—  c  *  a 

applyrot(d,  6,  c,  s,  1) 

r  oP » s  v  . . 

swap(t;  a,  6;  a,  6,  d) 
formrot(&,<f,  c,  s) 
d  <—  s  *a 

applyrot(a,  6,  c,  s,  1) 

NoDE(i  +  2) 


/*  op  =  9  */ 
formro t(a,  d)  c ,  s) 
send(t,  i  +  1;  c,  s) 


/*  op  =  12  */ 
recv(z  +  1;  c,  s) 


/*  op  =  16  */ 
formrot(6,  d)  c,  s) 
send(i;  c,  s) 


/*  op  =  13  V 

/*  op  =  17  V 

/*  op  =  19  */ 

s 

3 

formrot(a,  d}  c,  s) 

send(z  +  1,  i  +  2;  c,  s) 

recv(z;  c,  s) 

recv(z;  c,  s) 

applyrot(6,  a,  c,  s,  1) 

d  4—  $  *  6 

/*  op  =  14  */ 


/*  op  =  18  */ 


swap ( t  +  1 ;  a,  6;  a,  6,  d)  swap(i;  a,  6,  d\  a,  6) 
formrot(6,  d ,  c,  s)  formrot(fc,  d ,  c,  s) 

applyrot(a,  6,  c,  s,  1)  applyrot(a,  fc,  c,  s,  1) 
cf  s  *  a  a  <—  c  *  a 
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according  to  the  value  of  op  .  When  this  step  is  finished,  Node(2)  signals  the  host  node 
to  start  Lred(4),  and  it  moves  on  to  the  next  pattern  /  which  can  be  done  exactly  the 
same  way.  Similarly,  we  process  the  repeated  patterns  and  /  in  the  same  fashion. 
The  node  program  for  the  cyclic  storage  scheme  is  described  in  the  Appendix. 

7.3.3  Consecutive  Storage  Scheme 

In  this  scheme  consecutive  blocks  of  the  bidiagonal  matrix  are  stored  in  each  node. 
We  partition  the  set  of  nodes  exactly  the  same  way  as  in  (7.6)  Using  Matlab  notation, 
Node(j)  stores 

jq  +  1:  (j  +  1  )q  \i  j  <  pQ- \ 

(P0  ~  1)9  +  1:  k  if  j  =  P0  -  1 

n:  —1:  n  —  q  +  1  li  p^<  j  <  p  —  1 

n  —  (p  —  —  l)g:  —1:  k  +  1  if  j  =  p  —  1 

columns  of  B. 

Denote  n  (j)  as  the  number  of  columns  which  the  Node(j')  has  in  its  memory. 

C  v 

Then,  we  have 


n  (j)  =  q,  i  =  0,...,p  -2,p  ,p  +  l,...,p  — 2 

c  u  u  u 

«c(p0_1)  =(!-P0)9  +  fc 
n(p-  1)  ~{n-k)-{p-p-l)q. 

c  u 

Example  7.2.  Suppose  p  =  8,  n  =  16,  k  =  11.  Then  q  =  2,  P{  =  {0,1, 2,3,4}  and 
P  =  {5,6,7}.  Each  node  has  the  following  columns: 
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3 

0 

1 

2 

3 

4 

5 

6 

7 

columns 

1,2 

3,4 

5,6 

7,8 

9,10,11 

16,15 

14,13 

12 

n,(i) 

2 

2 

2 

2 

3 

2 

2 

1 

Note  that  the  last  n—k  columns  are  stored  in  the  nodes  in  P  in  reverse  order.  This 

S 

ordering  makes  it  possible  for  the  nodes  in  P  and  P ^  to  have  identical  node  programs  for 
the  bidiagonal  reductions  on  the  large  and  small  blocks.  From  Fig.  6.2,  we  see  that  the 
bidiagonal  reductions  on  these  two  blocks  are  exactly  opposite,  so  that  the  reduction  on 
the  small  block  can  be  done  by  reversing  the  order  of  the  diagonal  entries  and  y,  reducing 
the  block  exactly  the  same  way  as  the  large  block,  and  again  reversing  the  diagonal  and 
bidiagonal  entries  when  completed.  Therefore,  we  only  describe  the  reduction  steps  (6.8), 
that  is,  only  for  the  nodes  in  P^ 

As  in  cyclic  storage  scheme  Node (j)  contains  variables  a[i] ,  b  [i] ,  z  [i]  ,  i=l , . . . , 
n  (j)  to  store,  respectively,  the  subdiagonal  or  superdiagonal  entries,  diagonal  entries, 
and  z..  As  mentioned  before,  the  orders  of  array  elements  are  reversed. 

Following  the  notation  in  (7.4),  Node(O)  initiates  Lred(i'),  1,  sequen¬ 

tially.  Then  it  needs  to  communicate  with  Node(1)  to  execute  the  chasing  pattern 
of  Lred(<jt),  reducing  the  entries  of  z.  The  rest  of  LRED(g)  and  starting  Lred(<7  +  1), 
which  is  the  responsibility  of  Node(1),  are  done  simultaneously.  We  repeat  this  process 
until  all  of  the  nodes  will  have  finished  their  portion  of  the  bidiagonal  reduction. 

Let  us  call  a  node  which  initiates  a  Lred(i)  for  some  i  at  a  given  time,  the  master 
node.  A  slave  node  is  the  node  which  once  became  the  master  node,  but  now  has  task 
of  chasing  bulges  as  far  as  it  can.  Hence,  at  any  time,  there  can  exist  only  one  master 
node,  say,  Node(j),  for  some  j  <  p  —  1,  and  j  slave  nodes,  Node(O),  . . .,  Node(j  -  1). 
The  rest  of  the  nodes  stay  idle.  Note  that  Node(pq  -  1)  never  becomes  a  slave  node. 

A  pseudo-code  for  the  node  program  is  given  in  Fig.  7.3. 
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/*  We  assume  buf  contains  b[q-l],  b[q],  a[q],  z[q]  .  */ 

/*  Node(O)  is  the  master  */ 
if  myid  ==  0  then 

for  i  =  1  to  numcol-1 
reduce(i) ; 
end; 

send(myid+l,  buf);  recv(myid+l,  buf);  bchase(q,  numcol) ; 
for  i-q  to  k-q  /*  Now  it’s  a  slave  */ 

send(myid+l,  buf);  recv(myid+l,  buf);  bchase(q,  numcol); 
end; 

/*  Node(idhi)  is  the  master  */ 
else  if  myid  ==  idhi  then 

recv(myid-l,  buf);  redxnode(a,  b,  d,  z,  buf);  send(myid-l,  buf); 
for  i  =  1  to  numcol-1 

reduced);  recv(myid-l,  buf);  chxnode(a,  b,  d,  buf); 
send(myid-l,  buf); 
end ; 

/*  Node(l),  ...  ,  Node(idhi-l)  */ 
else 

recv(myid-l,  buf);  redxnode(a,  b,  d,  z,  buf); 
send(myid-l,  buf); 

for  i  «  1  to  numcol-1  /*  It's  now  the  master  */ 

reduced);  recv(myid-l,  buf);  chxnode(a,  b,  d,  buf); 
send(myid-l,  buf); 
end; 

/*  reduce  across  the  node  */ 

send(myid+l,  buf);  recv(myid+l,  buf);  bchase(q,  numcol); 
for  i  -  (myid+l)*numcol  to  k-1  /*  It’s  now  a  slave  */ 
send.and_recv(myid-l,  myid+1,  bufin,  buf out) ; 
chxnode(a,  b,  d,  bufin); 

send_and_recv(myid+l ,  myid-1,  bufout,  bufin); 
bchase(q,  numcol); 
end; 
end; 


Fig.  7.3.  Node  Program  for  the  Consecutive  Storage  Scheme 
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Here,  idhi  =  —  1  and  numcol  =  n^(j).  The  function  reduce(i)  performs  the 

following  transformation: 

l  I 


/.  ^ 

/.  \ 

fc  a 

b  a 

l  l 

i  l 

6  a 

b  a 

2  2 

2  2 

6.  a.  , 

=> 

b.  a  d 

i-i  i-i 

i-l  i-i 

b.  0 

b.  a 

t 

i  t 

6.  , 

e  b 

«+i 

t+l 

z.  z.  . 

z 

.  .+i^ 

i+i) 

followed  by  b chase (i) 


1  l 


bi  a i 


b 

2  2 


b  a  d 

*-i  i-i 


e  b 


t+ 1 


t+l 


6i  5i 


6 

2  2 


b  a 

t-i  i-i 


6.  a 

t  i 


t+i 


Zt+i 


redxnode  carries  out  the  sequence  of  pattern  14  4.  and  chxnode  144. 

n  r  12  3’  3  4  3 
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7.3.4  Computation  Cost 

Let  tmitai  and  t5  be,  respectively,  the  time  required  for  one  multiplication,  one 
addition,  and  one  square  root.  Then  the  time  required  for  (6.8)  is 


(7.7) 


where  T  ,T  and  T  (i)  are,  respectively,  the  time  required  for  formrot,  applyrot,  and 

g  a  reei 

reduce  (i).  Since 


T 

9 

=  10T«  +  2ta 

(7.8) 

T 

a 

=  4rM  +  2TA 

(7.9) 

TrJ» 

=  (2 j  -  3 )T  +  (2j  -  3)7  +  (4j  -  10)rM, 

(7.10) 

the  time  required  for  reducing  the  large  block  to  bidiagonal  form  is 

T{k)  =  (16fc2  +  34 k  -  196)tm  +  (4k2  +  8 k-  38)r^ 

(7.11) 

Moreover,  2x2  updating  or 

downdating  requires 

T  =  < 
22 

r 

12r  +  r  +  r  for  updating 

MAS 

9 r  +  6r  +  6r  for  downdating 
k  M  A  S 

(7.12) 

and  one  step  of  the  qd  process  needs 


T  ,  =  (10rc  -  10 k  -  8 )r  +  2 (n  -  k  -  l)r  . 

qd  v  '  M  x  A 


(7.13) 
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Therefore,  total  cost  required  for  the  Algorithm  6.1  by  each  node,  is,  on  average, 

Ts  ~  +  T{-n  -  k1  +  T22  +  T,d»  <7-14) 

Suppose  k  =  n/2.  Then 

tb  =i{(2TW  +  r;B  +  ri,)} 

=  -{(32A;2  +  78 Ik  -  390)r  ,  +  (8*2  +  18*  -  75)r  +  3r  } 

Here  we  took  the  average  for 

7,3.5  Communication  Cost 

For  our  analysis,  we  ignore  any  possible  contention  problem  for  simplicity.  Let 

a  and  (3  be,  respectively,  the  startup  time  and  the  data  transfer  rate,  time  per  byte. 
ddv 

We  need  at  most  four  real*4  in  single  precision,  so  that  sending  this  message  of  size  16 

takes  a  +4*4/9  seconds.  Then  the  data  in  the  packets  used  in  communication  would 
d  d 

be  of  size  multiple  of  16,  which  makes  the  communication  the  most  efficient  on  CM-5 
[90]. 

7.3. 5.1  Cyclic  Storage  Scheme 

From  Table  7.2,  we  see  that  F  and  G  require  communication  at  every  window 

ij  t] 

step.  Hence,  from  (7.2)  and  (7.3),  the  communication  cost  becomes 


(7.15) 

(7.16) 


t 

(bk  -  9)(a  +  16/3  )  if  k  >  n/2 
(5n  -bk-  8)(arf  +  16/3  J  if  k  <  n/2 


(7.17) 
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seconds,  which  is  impractical  on  a  coarse-grain  distributed-memory  MIMD  multiproces¬ 
sor  due  to  enormous  communication  overhead.  Moreover,  very  few  computations  are 
carried  out  by  each  node  (see  Table  7.2).  Therefore,  a  fine-grain  multiprocessor  with 
fast  communication  ability  would  be  required  to  make  this  storage  scheme  practical  for 
bidiagonal  reduction. 

7.3. 5. 2  Consecutive  Storage  Scheme 

Since  Node(O)  initiates  and  terminates  the  bidiagonal  reduction  for  the  large 
block,  the  total  communication  cost  for  this  scheme  is  determined  by  the  number  of 
send  and  recv  calls  done  by  Node(O).  For  reducing  the  first  q—  1  columns  of  B,  which 
is  done  by  Node(O),  no  communication  is  required.  As  a  slave  node  Node(O)  needs  to 
communicate  k  —  q  +  1  times  with  Node(1)  to  chase  the  bulges. 

Moreover,  when  k  <  n/ 2,  Node(po)  makes  extra  effort  to  complete  bidiagonal 
reduction  by  applying  one  step  of  the  qd  process  which  requires  one  more  communication 
step.  When  k  >  n/2,  nodes  in  P  can  carry  out  the  qd  step  while  the  large  block  gets 
reduced  by  the  nodes  in  P .  Therefore,  the  total  communication  cost  for  this  storage 
scheme  becomes 


(k  -  q  +  2)(a  +  16/3  )  if  k  >  n/2 

d  d 

{(n  -  k)  —  q  +  1  +  (|P  |  —  l)}(a  +  16/3  )  if  k  <  n/2 

s  d  a 


(7.18) 


seconds.  Here,  IP  I  denotes  the  number  of  nodes  in  P  .  Note  that  T  becomes  worst 
when  k  <  n/2  because  the  reduction  on  the  large  block  is  already  completed  before  the 
qd  process  begins,  and  hence  there  is  no  parallelism  among  the  nodes  for  this  phase  of 
the  algorithm,  requiring  |P  |  -  1  additional  communication  steps. 

S 
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7.4  Timing  Results 

We  implemented  Algorithm  6.1  with  both  storage  schemes  on  CM-5  that  consists 
of  up  to  32  nodes  at  the  Northeast  Parallel  Architectures  Center  (NPAC)  at  Syracuse 
University.  Each  node  of  the  CM-5  is  a  SPARC  chip  which  runs  at  32  MHz  and  delivers 
22  Mips  and  5  Mflops.  There  is  a  64  Kb  instruction  and  data  cache  and  a  16  Mb  memory 
in  each  node  [111].  In  each  node,  there  are  two  vector  units;  each  vector  unit  is  capable 
of  peak  rate  64  Mflops. 

We  generated  an  n-by-n  random  diagonal  matrix  B  of  the  form  (1.2)  and  a  random 
n-vector  z  with  k  =  n/2,  where  n  =  64,128,256,512,1024.  We  only  implemented  the 
bidiagonal  reduction  part  for  modifying  the  SVD  described  in  Algorithm  6.1.  Computing 
the  SVD  of  a  general  full  matrix  on  a  distributed-memory  multiprocessor  is  described  in 
[71],  and  computing  the  SVD  of  the  bidiagonal  matrix  in  [33]. 

The  execution  time  for  bidiagonal  reduction  using  the  consecutive  storage  scheme 
with  different  matrix  sizes  and  various  set  of  processing  nodes  is  given  in  Table  7.3.  Here, 
we  use  the  following  definition  for  speed-up  achieved  by  our  algorithm 

„  n i) 

"  T(p) 

where  T(p)  is  time  required  to  execute  the  program  on  p  processors.  Similarly,  we  define 
the  efficiency  of  a  parallel  algorithm  as 

r_  n i) 

p  T{p)‘ 


We  observe  a  linear  speed-up  as  n  increases  with  p  fixed.  When  n  is  small,  we 
cannot  expect  any  speed-up  mainly  because  of  high  communication  cost  compared  to  the 
computation  cost.  For  instance,  when  n  =  64,  more  than  50%  of  the  total  cost  accounts 
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for  the  communication  regardless  of  the  number  of  processing  nodes  (except  for  the  case 
p  =  4).  In  fact,  when  p  =  32,  most  of  time  were  spent  in  communication.  In  this  case, 
since  the  value  of  q  is  only  two,  each  node  has  so  little  to  do,  and  therefore  the  ratio  of 
the  time  spent  in  communication  and  computation  is  quite  large. 

From  (7.18)  since  q  =  n/p,  we  see  that  as  p  increases,  so  does  T  .  However,  in 
general,  q  is  small  compared  to  k .  Hence,  the  difference  in  the  communication  cost  be¬ 
tween  different  p  is  not  significant  as  seen  in  Table  7.3.  In  general,  as  n  increases,  so  does 
g,  and  therefore,  the  ratio  of  communication  cost  and  total  cost  also  decreases.  However, 
when  p  is  small  and  n  is  large,  significant  part  of  time  were  spent  in  communication 
(p  =  4,  n  =  512, 1024;  p  =  8,  n  =  1024),  and  little  speed-up  was  gained. 

Table  7.4  shows  the  difference  in  execution  time  for  various  k  when  n  =  1024  and 
p  =  32.  Clearly,  we  achieve  an  optimal  speed-up  when  k  is  close  to  n/2.  Note  also  that 
the  bidiagonal  reduction  with  k  >  n/2  is  slightly  faster  than  that  with  k  <  n/2  mainly 
because  when  k  >  n/2,  the  qd  step  can  be  executed  in  parallel  with  the  last  chasing 
step  for  the  large  block. 

We  also  implemented  Algorithms  6.1  and  6.2  using  the  cyclic  storage  scheme. 
However,  it  was  embarrassingly  slow,  and  no  speed-up  was  gained  in  any  case.  As  we 
have  shown  in  the  previous  section,  this  storage  scheme  becomes  totally  impractical  when 
n  >  p  due  to  high  communication  overhead  caused  by  severe  contention  problems. 


Table  7.3.  CPU  Time  (in  sec)  for  the  Bidiagonal  Reduction 


V 

n 

Comp 

Comm 

Total 

%  Comm 

Speedup 

64 

0.0182 

0.0045 

0.0227 

19.8 

3.3 

128 

0.0746 

0.0086 

0.0831 

10.3 

3.5 

4 

256 

0.3032 

0.0171 

0.3203 

5.3 

3.6 

512 

1.5543 

0.1314 

1.6857 

7.8 

2.7 

1024 

5.2692 

0.3672 

5.6364 

6.5 

3.3 

64 

0.0079 

0.0078 

0.0157 

49.7 

4.8 

128 

0.0338 

0.0137 

0.0475 

28.8 

6.1 

8 

256 

0.1397 

0.0266 

0.1665 

16.0 

6.8 

512 

0.5697 

0.0529 

0.6308 

8.4 

7.2 

1024 

3.4508 

0.3929 

3.8326 

10.3 

4.9 

64 

0.0045 

0.0136 

0.0181 

75.0 

4.2 

128 

0.0214 

0.0235 

0.0449 

52.3 

6.5 

16 

256 

0.0929 

0.0417 

0.1346 

31.0 

8.5 

512 

0.3878 

0.0768 

0.4645 

16.5 

9.8 

1024 

1.2302 

0.1363 

1.3664 

9.9 

13.8 

64 

0.0013 

0.0197 

0.0210 

93.8 

3.6 

128 

0.0089 

0.0294 

0.0383 

76.8 

7.6 

32 

256 

0.0429 

0.0504 

0.0934 

54.0 

12.3 

512 

0.1875 

0.0903 

0.2778 

32.5 

16.4 

1024 

0.7824 

0.1537 

0.9361 

16.2 

20.1 

Table  7.4.  CPU  Time  (in  sec)  with  Various  k  (n  =  1024,  p  =  32) 


k 

Total 

Speedup 

k 

Total 

Speedup 

64 

1.9039 

9.9 

960 

1.8818 

9.9 

128 

1.7741 

10.6 

896 

1.7542 

10.7 

256 

1.5138 

12.4 

768 

1.4999 

12.5 

512 

0.9361 

20.1 
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Chapter  8 

Conclusion 

We  have  presented  efficient  algorithms  for  modifying  the  TSODs,  and  shown  that 
the  TSOD  can  be  updated  and  downdated  in  0(n  )  flops  in  a  manner  that  preserves  its 
structure.  The  backward  error  analysis  and  perturbation  theory  show  that  the  proce¬ 
dures  satisfy  a  blockwise  stability  property.  Thus  if  our  interest  is  in  separating  the  two 
subspaces  associated  with  the  large  and  small  singular  values,  we  will  obtain  answers 
that  are  as  good  as  can  be  expected.  The  use  of  this  perturbation  theory  shows  that 
we  achieve  more  accuracy  in  the  singular  values  and  more  orthogonality  in  the  singular 
vectors  that  result  from  our  update  procedures.  Our  numerical  tests  show  some  im¬ 
provement  in  the  accuracy  of  downdated  singular  values  using  our  algorithm  instead  of 
general  chasing. 

Our  approach  to  modifying  the  ULVD  is  particularly  promising.  It  is  simple  to 
implement  for  both  updating  and  downdating,  and  preserves  the  rank-revealing  struc¬ 
ture  often  without  the  deflation  process  to  compute  the  numerical  rank  of  the  matrix. 
Moreover,  one  can  efficiently  track  the  size  of  each  block  of  lower  triangular  part  of  factor¬ 
ization  for  an  accurate  monitoring  of  the  condition  of  downdating  problem.  Furthermore, 
data  independence  among  the  blocks  makes  the  algorithms  parallelizable. 

We  also  have  given  algorithms  for  rank-one  updates  and  downdates  of  the  SVD 
and  partially  reduced  bidiagonal  forms.  It  has  also  been  shown  that  these  algorithms 
satisfy  a  blockwise  stability  criterion  that  has  not  been  shown  for  other  algorithms.  The 
algorithms  proposed  to  allow  the  user  to  specify  a  tolerance  between  large  and  small 
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singular  values,  and  the  separation  between  the  associated  subspaces  is  preserved  by  the 
algorithm. 

Finally,  we  presented  an  efficient  method  for  modifying  the  SVD  in  parallel  to 
establish  its  practical  value  in  real  time  applications.  The  algorithm  preserves  the  block 
structures,  maintaining  the  efficiency  of  parallel  procedures  as  well.  The  consecutive 
storage  scheme  outperforms  the  cyclic  storage  scheme  in  bidiagonal  reduction  due  to 
high  communication  cost  of  the  latter  scheme.  The  experiments  show  the  efficiency  in 
using  the  processors  was  slightly  over  60%  for  the  consecutive  storage  scheme. 

Although  the  entire  thesis  is  devoted  to  the  problem  of  modifying  the  TSOD,  there 
are  a  number  of  unresolved  issues  and  problems.  We  suggest  a  few  in  the  following: 

•  The  stability  of  the  CSNE  approach  taken  when  downdating  is  not  possible  is  de¬ 
termined  only  by  experimental  results.  In  particular,  one  needs  to  know  how  good 
x  in  (6.17)  really  is.  A  rigorous  error  analysis  should  be  performed  by  extending 
such  results  as  those  due  to  Bjorck  [18]. 

•  Parallelizing  the  ULVD  procedures  is  much  more  challenging  than  the  SVD  pro¬ 
cedures.  Although  the  reduction  on  the  large  and  small  blocks  can  be  executed 
simultaneously,  it  is  difficult  to  enhance  parallelism  within  the  block.  As  suggested 
in  [103]  for  parallelizing  the  URVD,  we  may  require  a  fine-grain  MIMD  architecture 
for  efficient  implementation  of  systolic  arrays. 

•  A  procedure  described  by  Gu  and  Eisenstat  [56]  is  a  promising  approach  to  modify 
the  singular  vector  matrix  after  modifying  the  SVD.  They  use  an  adaptive  version 
of  one-dimensional  fast  multipole  method  [29].  However,  an  efficient  parallel  imple¬ 
mentation  for  this  acceleration  method  is  not,  as  yet,  available  although  a  parallel 
procedure  for  non-adaptive  version  has  been  developed  [45].  Together  with  our 
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parallel  bidiagonal  reduction  procedure  described  in  Chapter  7,  and  parallel  al¬ 
gorithms  for  computing  the  SVD  of  bidiagonal  matrices  [33,  65,  107],  the  SVD 
algorithm  described  in  Chapter  6  can  be  implemented  fully  in  parallel. 

•  It  would  also  be  interesting  to  extend  algorithms  for  modifying  the  ULVD  to  those 
for  modifying  the  ULLVD  for  two  matrices  [75]  as  an  approximation  to  the  gen¬ 
eralized  SVD  [84,  120].  We  also  need  to  analyze  the  stability  and  complexity  of 
the  algorithms  for  modifying  the  ULLVD  and  derive  the  error  bounds  on  the  sub¬ 
spaces  computed  by  the  ULLVD  compared  with  those  by  the  generalized  SVD.  The 
Estimator-Correlator  array  processor  [108,  109]  that  can  implement  the  estimator 
kernel  using  the  ULVD  and  the  inverse  noise  kernel  using  the  ULLVD  would  be  a 
practical  application. 
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Appendix 

Node  Program  for  Bidiagonal  Reduction 
with  Cyclic  Storage  Scheme 


This  node  program  will  reduce  the  arrow-head  matrix  to 
the  bidiagonal  matrix  using  cyclic  storage  scheme 

Variables  used: 

abdz  —  contains  values  of  a,  b,  d,  and  z  in  that  order 
myid  —  node  id 

idpl  —  myid  +  1 

idp2  —  myid  +  2 

idml  —  myid  -  1 

idm2  —  myid  -  2 

hid  —  host  id 

cs  —  angles  for  the  rotations 
inrngl  —  processing  the  large  block 
inrng2  —  processing  the  small  block 

.  special  case  —  first  two  rotations  in  each  block 

100  continue 

again  -  .true. 

50  continue 

ret  *  CMMD_receive_block(src,  tag,  op,  isize) 

150  continue 

go  to  (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19),  op 

1  ...  op  =  1 

2  ...  op  =  2 

3  continue 

ret  =  CMHD.send.block(idpl,  tag,  7,  isize) 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 
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ret  =  CMMD_send_block(idp2,  tag,  9,  isize) 

ret  =  CMMD_receive_block(idp2,  tag,  cs,  angsiz) 

abdz(3)  =  cs(2)*abdz(2) 

abdz(2)  =  cs(l)*abdz(2) 

op  =  4 

it  =  it  +  1 

go  to  200 

4  continue 

ret  =  CMMD_send_block(idpl,  tag,  8,  isize) 
ret  =  CMMD_swap(idpl,  abdzt,  bufsiz,  abdz,  bufsiz) 
call  formrot(abdzt(2) ,  abdz (3),  cs(l),  cs(2)) 
abdz(l)  =  cs(l)*abdz(l) 

call  applyrot (abdzt ( 1) ,  abdz(2) ,  cs(l),  cs(2),  1) 
op  *  3 
it  =  it  +  1 
go  to  200 

5  ...  op  =  5 

6  ...  op  =  6 

7  continue 

ret  =  CMMD_receive_block(idpl,  tag,  cs,  angsiz) 
call  applyrot (abdz (2) ,  abdz(l),  cs(l),  cs(2),  1) 
it  =  it  +  1 
go  to  100 

8  continue 

if  (inmgl  .and.  it  .eq.  maxit)  then 

ret  =  CMMD_send_block(hid,  tag,  20,  isize) 
end  if 

ret  =  CMMD_swap(idml ,  abdzt,  bufsiz,  abdz,  bufsiz) 
call  formrot(abdz(2) ,  abdzt(3),  cs(l),  cs(2)) 
abdz(3)  =  cs(2)*abdzt(l) 

call  applyrot (abdz ( 1) ,  abdzt(2),  cs(l),  cs(2),  1) 
it  -  it  +  1 
go  to  100 

9  continue 

call  formrot(abdz(l) ,  abdz(3),  cs(l),  cs(2)) 
ret  =  CMMD_send.block(idm2,  tag,  cs,  angsiz) 
ret  =  CMMD_send_block(idml ,  tag,  cs,  angsiz) 
it  =  it  +  1 
go  to  100 


% 


4 
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10  ...  terminate 

go  to  999 

200  continue 

if  (again)  then 

again  =  .not.  again 
go  to  150 
else 

if  (my id  .ne.  0)  then 

ret  =  CMMD_send_block(idml,  tag,  op,  isize) 
else 

ret  =  CMMD_send_block(hid,  tag,  op,  isize) 
end  if 
go  to  100 
end  if 

999  continue 


similarly  for  the  small  block 


